The CHAI Philosophy
Cognitive Hive AI (CHAI) is the core architectural philosophy of Simplex. Instead of relying on monolithic large language models (LLMs) for all AI tasks, CHAI orchestrates collections of specialized Small Language Models (SLMs) working together like a hive of specialists.
Key Insight
A team of specialists outperforms a generalist at specific tasks. The same principle applies to language models - a fine-tuned 7B model often beats GPT-4 at its specialty.
The Problem with Monolithic LLMs
Traditional approaches using large language models face several challenges:
| Challenge | Impact |
|---|---|
| High Cost | $0.03-0.12 per 1K tokens adds up quickly at scale |
| High Latency | 500-3000ms response times hurt user experience |
| Black Box | Complex prompt engineering required, unpredictable behavior |
| Privacy Concerns | Data must leave your infrastructure |
| Rate Limits | API quotas and outages affect availability |
The CHAI Solution
CHAI solves these problems through five principles:
- Specialize: Each model masters a narrow domain (summarization, entity extraction, sentiment, etc.)
- Collaborate: Models communicate via message passing
- Scale: Add specialists as needs grow
- Fail Gracefully: One specialist down doesn't stop the hive
- Cost Pennies: Run on commodity ARM instances
Per-Hive SLM Architecture
The core architectural decision of CHAI v0.5.0: each hive provisions ONE shared SLM that all its specialists use. This is fundamentally different from giving each specialist its own model.
Why Per-Hive, Not Per-Specialist?
| Per-Specialist (Old) | Per-Hive (CHAI v0.5.0) |
|---|---|
| 10 specialists = 10 models | 10 specialists = 1 model |
| 80+ GB RAM required | 8-12 GB RAM total |
| Expensive, wasteful | Efficient, practical |
| No shared consciousness | HiveMnemonic creates collective knowledge |
Key Insight
Each specialist has its own Anima (personal memories and beliefs), but all specialists share the Hive SLM for inference. The HiveMnemonic provides shared consciousness - what one specialist learns, all can access.
Core Constructs
Specialists
A specialist is an actor that wraps a small language model fine-tuned for a specific task:
specialist EntityExtractor {
model: "ner-fine-tuned-7b",
domain: "named entity extraction",
memory: 8.GB,
temperature: 0.1,
max_tokens: 500,
receive Extract(text: String) -> List<Entity> {
let raw = infer("Extract all named entities from: {text}")
parse_entities(raw)
}
}
The infer Primitive
Inside a specialist, the infer function calls the underlying model:
// Basic inference
let result = infer(prompt)
// With parameters
let result = infer(prompt, temperature: 0.7, max_tokens: 200)
// Streaming
for chunk in infer_stream(prompt) {
emit(chunk)
}
// Typed extraction
let data = infer_typed<Person>(prompt)
Hives
A hive is a supervisor for specialists with a shared SLM and collective memory:
hive DocumentProcessor {
// Specialists in this hive
specialists: [
Summarizer,
EntityExtractor,
SentimentAnalyzer,
TopicClassifier
],
// Shared SLM for all specialists (v0.5.0)
slm: "simplex-cognitive-7b",
// Shared consciousness across specialists
mnemonic: {
episodic: { capacity: 1000, importance_threshold: 0.4 },
semantic: { capacity: 5000 },
beliefs: { revision_threshold: 50 }, // 50% for hive beliefs
},
// How tasks are routed to specialists
router: SemanticRouter(
embedding_model: "simplex-mnemonic-embed",
fallback: Summarizer
),
strategy: OneForOne,
}
Routing Strategies
CHAI supports multiple routing strategies to direct requests to the right specialist:
Semantic Router
Uses embedding similarity to match requests with specialist domains. Best for natural language queries.
Rule Router
Pattern matching with explicit rules. Best for structured inputs with clear categories.
LLM Router
A small model decides which specialist to invoke. Best for complex, ambiguous requests.
Cascade Router
Try specialists in order until one succeeds. Best for fallback scenarios.
Ensemble Patterns
Combine multiple specialists for better results:
// Parallel - all specialists work simultaneously
let results = await parallel(
ask(summarizer, Summarize(doc)),
ask(extractor, Extract(doc)),
ask(classifier, Classify(doc))
)
// Voting - multiple specialists vote on a decision
let verdict = await vote(
[judge1, judge2, judge3],
Evaluate(submission),
threshold: 0.6
)
// Chain - sequential pipeline processing
let result = doc
|> ask(cleaner, Clean)
|> ask(translator, Translate(to: "en"))
|> ask(summarizer, Summarize)
HiveMnemonic: Shared Consciousness
The HiveMnemonic is the shared memory layer that creates collective consciousness across all specialists in a hive. Unlike traditional RAG systems that rely on vector similarity search, the HiveMnemonic integrates directly with each specialist's Anima to form a unified cognitive substrate.
What One Learns, All Know
When a specialist learns something new, it can contribute that knowledge to the HiveMnemonic. Other specialists automatically benefit from this shared knowledge on their next inference.
Contributing to Shared Memory
specialist Researcher {
receive Research(topic: String) -> Findings {
let findings = do_research(topic)
// Personal memory (my Anima only)
self.anima.remember("I researched: {topic}")
// Shared memory (HiveMnemonic - all specialists can access)
hive.mnemonic.learn("Research finding: {findings.summary}")
hive.mnemonic.believe(
"Topic {topic} is well-documented",
confidence: 80
)
findings
}
}
specialist Synthesizer {
receive Synthesize(query: String) -> Report {
// Recall from shared HiveMnemonic
let team_knowledge = hive.mnemonic.recall_for(query)
// Recall from personal Anima
let my_experience = self.anima.recall_for(query)
// Both inform the inference to the shared Hive SLM
infer("Create synthesis report for: {query}")
}
}
How Context Flows to the SLM
When a specialist calls infer(), context is automatically assembled from both personal and shared memory:
- Personal context - The specialist's Anima memories are formatted
- Shared context - The Hive's Mnemonic is added
- Combined prompt - Both contexts prepended to the prompt
- Inference - Sent to the shared Hive SLM
Three-Tier Memory Hierarchy
CHAI implements a three-level memory system with different belief thresholds at each level:
Belief Thresholds
Different levels require different amounts of evidence to revise beliefs:
| Level | Threshold | Purpose |
|---|---|---|
| Anima (Individual) | 30% | Flexible personal beliefs, quick to adapt |
| Mnemonic (Hive) | 50% | Shared beliefs require consensus |
| Divine (Global) | 70% | Organization-wide truths, high confidence required |
Belief Propagation
A belief held by an individual Anima at high confidence can be promoted to the HiveMnemonic. Similarly, hive beliefs can propagate to Divine level when multiple hives reach consensus. This creates emergent organizational knowledge from individual learning.
Cost Analysis
CHAI dramatically reduces AI costs compared to external APIs:
| Configuration | Monthly Cost |
|---|---|
| Small hive (5 specialists, CPU) | ~$35/month |
| Medium hive (10 specialists, CPU) | ~$85/month |
| High-performance (10 specialists, GPU) | ~$1,200/month |
Compared to GPT-4 API:
| Requests/Month | CHAI Cost | API Cost | Savings |
|---|---|---|---|
| 100K | $35 | $300 | 88% |
| 1M | $85 | $3,000 | 97% |
| 10M | $1,200 | $30,000 | 96% |
Naming Conventions
CHAI offers two naming traditions for specialists:
Elvish (Poetic)
Sindarin/Quenya names for an organic feel: Isto (Knowledge), Penna (Storyteller), Curu (Craft), Silma (Clarity)
Latin (Technical)
Classical names for a formal feel: Cogito (Think), Scribo (Write), Lego (Read), Faber (Craftsman)