Cognitive Hive AI

The CHAI Philosophy

Cognitive Hive AI (CHAI) is the core architectural philosophy of Simplex. Instead of relying on monolithic large language models (LLMs) for all AI tasks, CHAI orchestrates collections of specialized Small Language Models (SLMs) working together like a hive of specialists.

Key Insight

A team of specialists outperforms a generalist at specific tasks. The same principle applies to language models - a fine-tuned 7B model often beats GPT-4 at its specialty.

The Problem with Monolithic LLMs

Traditional approaches using large language models face several challenges:

Challenge	Impact
High Cost	$0.03-0.12 per 1K tokens adds up quickly at scale
High Latency	500-3000ms response times hurt user experience
Black Box	Complex prompt engineering required, unpredictable behavior
Privacy Concerns	Data must leave your infrastructure
Rate Limits	API quotas and outages affect availability

The CHAI Solution

CHAI solves these problems through five principles:

Specialize: Each model masters a narrow domain (summarization, entity extraction, sentiment, etc.)
Collaborate: Models communicate via message passing
Scale: Add specialists as needs grow
Fail Gracefully: One specialist down doesn't stop the hive
Cost Pennies: Run on commodity ARM instances

Per-Hive SLM Architecture

The core architectural decision of CHAI v0.5.0: each hive provisions ONE shared SLM that all its specialists use. This is fundamentally different from giving each specialist its own model.

Why Per-Hive, Not Per-Specialist?

Per-Specialist (Old)	Per-Hive (CHAI v0.5.0)
10 specialists = 10 models	10 specialists = 1 model
80+ GB RAM required	8-12 GB RAM total
Expensive, wasteful	Efficient, practical
No shared consciousness	HiveMnemonic creates collective knowledge

Key Insight

Each specialist has its own Anima (personal memories and beliefs), but all specialists share the Hive SLM for inference. The HiveMnemonic provides shared consciousness - what one specialist learns, all can access.

Core Constructs

Specialists

A specialist is an actor that wraps a small language model fine-tuned for a specific task:

specialist EntityExtractor {
    model: "ner-fine-tuned-7b",
    domain: "named entity extraction",
    memory: 8.GB,
    temperature: 0.1,
    max_tokens: 500,

    receive Extract(text: String) -> List<Entity> {
        let raw = infer("Extract all named entities from: {text}")
        parse_entities(raw)
    }
}

The infer Primitive

Inside a specialist, the infer function calls the underlying model:

// Basic inference
let result = infer(prompt)

// With parameters
let result = infer(prompt, temperature: 0.7, max_tokens: 200)

// Streaming
for chunk in infer_stream(prompt) {
    emit(chunk)
}

// Typed extraction
let data = infer_typed<Person>(prompt)

Hives

A hive is a supervisor for specialists with a shared SLM and collective memory:

hive DocumentProcessor {
    // Specialists in this hive
    specialists: [
        Summarizer,
        EntityExtractor,
        SentimentAnalyzer,
        TopicClassifier
    ],

    // Shared SLM for all specialists (v0.5.0)
    slm: "simplex-cognitive-7b",

    // Shared consciousness across specialists
    mnemonic: {
        episodic: { capacity: 1000, importance_threshold: 0.4 },
        semantic: { capacity: 5000 },
        beliefs: { revision_threshold: 50 },  // 50% for hive beliefs
    },

    // How tasks are routed to specialists
    router: SemanticRouter(
        embedding_model: "simplex-mnemonic-embed",
        fallback: Summarizer
    ),

    strategy: OneForOne,
}

Routing Strategies

CHAI supports multiple routing strategies to direct requests to the right specialist:

Semantic Router

Uses embedding similarity to match requests with specialist domains. Best for natural language queries.

Rule Router

Pattern matching with explicit rules. Best for structured inputs with clear categories.

LLM Router

A small model decides which specialist to invoke. Best for complex, ambiguous requests.

Cascade Router

Try specialists in order until one succeeds. Best for fallback scenarios.

Ensemble Patterns

Combine multiple specialists for better results:

// Parallel - all specialists work simultaneously
let results = await parallel(
    ask(summarizer, Summarize(doc)),
    ask(extractor, Extract(doc)),
    ask(classifier, Classify(doc))
)

// Voting - multiple specialists vote on a decision
let verdict = await vote(
    [judge1, judge2, judge3],
    Evaluate(submission),
    threshold: 0.6
)

// Chain - sequential pipeline processing
let result = doc
    |> ask(cleaner, Clean)
    |> ask(translator, Translate(to: "en"))
    |> ask(summarizer, Summarize)

HiveMnemonic: Shared Consciousness

The HiveMnemonic is the shared memory layer that creates collective consciousness across all specialists in a hive. Unlike traditional RAG systems that rely on vector similarity search, the HiveMnemonic integrates directly with each specialist's Anima to form a unified cognitive substrate.

What One Learns, All Know

When a specialist learns something new, it can contribute that knowledge to the HiveMnemonic. Other specialists automatically benefit from this shared knowledge on their next inference.

Contributing to Shared Memory

specialist Researcher {
    receive Research(topic: String) -> Findings {
        let findings = do_research(topic)

        // Personal memory (my Anima only)
        self.anima.remember("I researched: {topic}")

        // Shared memory (HiveMnemonic - all specialists can access)
        hive.mnemonic.learn("Research finding: {findings.summary}")
        hive.mnemonic.believe(
            "Topic {topic} is well-documented",
            confidence: 80
        )

        findings
    }
}

specialist Synthesizer {
    receive Synthesize(query: String) -> Report {
        // Recall from shared HiveMnemonic
        let team_knowledge = hive.mnemonic.recall_for(query)

        // Recall from personal Anima
        let my_experience = self.anima.recall_for(query)

        // Both inform the inference to the shared Hive SLM
        infer("Create synthesis report for: {query}")
    }
}

How Context Flows to the SLM

When a specialist calls infer(), context is automatically assembled from both personal and shared memory:

Personal context - The specialist's Anima memories are formatted
Shared context - The Hive's Mnemonic is added
Combined prompt - Both contexts prepended to the prompt
Inference - Sent to the shared Hive SLM

Three-Tier Memory Hierarchy

CHAI implements a three-level memory system with different belief thresholds at each level:

Belief Thresholds

Different levels require different amounts of evidence to revise beliefs:

Level	Threshold	Purpose
Anima (Individual)	30%	Flexible personal beliefs, quick to adapt
Mnemonic (Hive)	50%	Shared beliefs require consensus
Divine (Global)	70%	Organization-wide truths, high confidence required

Belief Propagation

A belief held by an individual Anima at high confidence can be promoted to the HiveMnemonic. Similarly, hive beliefs can propagate to Divine level when multiple hives reach consensus. This creates emergent organizational knowledge from individual learning.

Cost Analysis

CHAI dramatically reduces AI costs compared to external APIs:

Configuration	Monthly Cost
Small hive (5 specialists, CPU)	~$35/month
Medium hive (10 specialists, CPU)	~$85/month
High-performance (10 specialists, GPU)	~$1,200/month

Compared to GPT-4 API:

Requests/Month	CHAI Cost	API Cost	Savings
100K	$35	$300	88%
1M	$85	$3,000	97%
10M	$1,200	$30,000	96%

Naming Conventions

CHAI offers two naming traditions for specialists:

Elvish (Poetic)

Sindarin/Quenya names for an organic feel: Isto (Knowledge), Penna (Storyteller), Curu (Craft), Silma (Clarity)

Latin (Technical)

Classical names for a formal feel: Cogito (Think), Scribo (Write), Lego (Read), Faber (Craftsman)

The CHAI Philosophy

The Problem with Monolithic LLMs

The CHAI Solution

Per-Hive SLM Architecture

Why Per-Hive, Not Per-Specialist?

Core Constructs

Specialists

The infer Primitive

Hives

Routing Strategies

Semantic Router

Rule Router

LLM Router

Cascade Router

Ensemble Patterns

HiveMnemonic: Shared Consciousness

Contributing to Shared Memory

How Context Flows to the SLM

Three-Tier Memory Hierarchy

Belief Thresholds

Cost Analysis

Naming Conventions

Elvish (Poetic)

Latin (Technical)

Next Steps

Small Language Models

Examples