The CHAI Philosophy
Cognitive Hive AI (CHAI) is the core architectural philosophy of Simplex. Instead of relying on monolithic large language models (LLMs) for all AI tasks, CHAI orchestrates collections of specialized Small Language Models (SLMs) working together like a hive of specialists.
Key Insight
A team of specialists outperforms a generalist at specific tasks. The same principle applies to language models - a fine-tuned 7B model often beats GPT-4 at its specialty.
The Problem with Monolithic LLMs
Traditional approaches using large language models face several challenges:
| Challenge | Impact |
|---|---|
| High Cost | $0.03-0.12 per 1K tokens adds up quickly at scale |
| High Latency | 500-3000ms response times hurt user experience |
| Black Box | Complex prompt engineering required, unpredictable behavior |
| Privacy Concerns | Data must leave your infrastructure |
| Rate Limits | API quotas and outages affect availability |
The CHAI Solution
CHAI solves these problems through five principles:
- Specialize: Each model masters a narrow domain (summarization, entity extraction, sentiment, etc.)
- Collaborate: Models communicate via message passing
- Scale: Add specialists as needs grow
- Fail Gracefully: One specialist down doesn't stop the hive
- Cost Pennies: Run on commodity ARM instances
Core Constructs
Specialists
A specialist is an actor that wraps a small language model fine-tuned for a specific task:
specialist EntityExtractor {
model: "ner-fine-tuned-7b",
domain: "named entity extraction",
memory: 8.GB,
temperature: 0.1,
max_tokens: 500,
receive Extract(text: String) -> List<Entity> {
let raw = infer("Extract all named entities from: {text}")
parse_entities(raw)
}
}
The infer Primitive
Inside a specialist, the infer function calls the underlying model:
// Basic inference
let result = infer(prompt)
// With parameters
let result = infer(prompt, temperature: 0.7, max_tokens: 200)
// Streaming
for chunk in infer_stream(prompt) {
emit(chunk)
}
// Typed extraction
let data = infer_typed<Person>(prompt)
Hives
A hive is a supervisor for specialists with intelligent routing:
hive DocumentProcessor {
specialists: [
Summarizer,
EntityExtractor,
SentimentAnalyzer,
TopicClassifier
],
router: SemanticRouter(
embedding_model: "all-minilm-l6-v2"
),
strategy: OneForOne,
memory: SharedVectorStore(dimension: 384),
context: ConversationBuffer(max_turns: 50)
}
Routing Strategies
CHAI supports multiple routing strategies to direct requests to the right specialist:
Semantic Router
Uses embedding similarity to match requests with specialist domains. Best for natural language queries.
Rule Router
Pattern matching with explicit rules. Best for structured inputs with clear categories.
LLM Router
A small model decides which specialist to invoke. Best for complex, ambiguous requests.
Cascade Router
Try specialists in order until one succeeds. Best for fallback scenarios.
Ensemble Patterns
Combine multiple specialists for better results:
// Parallel - all specialists work simultaneously
let results = await parallel(
ask(summarizer, Summarize(doc)),
ask(extractor, Extract(doc)),
ask(classifier, Classify(doc))
)
// Voting - multiple specialists vote on a decision
let verdict = await vote(
[judge1, judge2, judge3],
Evaluate(submission),
threshold: 0.6
)
// Chain - sequential pipeline processing
let result = doc
|> ask(cleaner, Clean)
|> ask(translator, Translate(to: "en"))
|> ask(summarizer, Summarize)
Shared Memory
Specialists can share context through hive memory:
- Vector Store: Semantic search across all specialists
- Conversation Context: Shared conversation history
- Working Memory: Short-term shared state with TTL
specialist ResearchAssistant {
// ...
receive Research(topic: String) -> Report {
// Search shared vector store
let relevant = hive.memory.search(
ai::embed(topic),
limit: 10
)
// Get conversation context
let history = hive.context.recent(5)
// Generate with context
let report = infer(build_prompt(topic, relevant, history))
// Store for future searches
hive.memory.store(report)
report
}
}
Cost Analysis
CHAI dramatically reduces AI costs compared to external APIs:
| Configuration | Monthly Cost |
|---|---|
| Small hive (5 specialists, CPU) | ~$35/month |
| Medium hive (10 specialists, CPU) | ~$85/month |
| High-performance (10 specialists, GPU) | ~$1,200/month |
Compared to GPT-4 API:
| Requests/Month | CHAI Cost | API Cost | Savings |
|---|---|---|---|
| 100K | $35 | $300 | 88% |
| 1M | $85 | $3,000 | 97% |
| 10M | $1,200 | $30,000 | 96% |
Naming Conventions
CHAI offers two naming traditions for specialists:
Elvish (Poetic)
Sindarin/Quenya names for an organic feel: Isto (Knowledge), Penna (Storyteller), Curu (Craft), Silma (Clarity)
Latin (Technical)
Classical names for a formal feel: Cogito (Think), Scribo (Write), Lego (Read), Faber (Craftsman)