Skip to content

LangGraph Pipeline & Execution Lifecycle

The MoE Sovereign pipeline is built on LangGraph to manage the entire request lifecycle. It routes tasks dynamically, executes specialists in parallel, performs contextual RAG searches, and synthesizes final answers.


1. Request Lifecycle Flowchart

With the integration of the IMoE Gating Network (June 2026), the request lifecycle is split into two phases: 1. Gate Phase (dynamic template compilation) and 2. Execution Phase (LangGraph execution).

flowchart TD
    Start([Client Request]) --> GateEmbed["Local Embedding\n(all-MiniLM-L6-v2)"]
    GateEmbed --> GateCache{"ChromaDB Template Cache\ncosine distance < 0.18?"}

    GateCache -->|Hit 🎯| GateReady["Apply Cached Template"]
    GateCache -->|Miss| GateONNX["⚡ Sovereign Router ONNX\nClassifier < 5ms CPU"]

    GateONNX --> GateAlloc["🔀 Dynamic Allocator\n(Thompson Sampling + VRAM clamping)"]
    GateAlloc --> GateReady

    GateReady --> PipeCheck{"L1 Response Cache Hit?\nChromaDB cosine < 0.15?"}

    PipeCheck -->|Yes ⚡| ReturnResponse([SSE Response Stream])

    PipeCheck -->|No| PlanCacheCheck{"L2 Plan Cache Hit?\nValkey SHA256 plan key"}

    PlanCacheCheck -->|Yes| FanOut
    PlanCacheCheck -->|No| PlannerNode["🧠 Planner Node\nJudge LLM task decomposition"]

    PlannerNode --> ValkeyWrite["Write plan to Valkey (TTL 30 min)"]
    ValkeyWrite --> FanOut

    subgraph FanOut ["Parallel execution fan-out"]
        direction LR
        Workers["👥 Expert Workers\nT1 + T2 confidence-gated"]
        Research["🌐 SearXNG Research"]
        Math["∑ Math (SymPy)"]
        MCP["🔧 MCP Node\n16 deterministic tools"]
        GraphRAG["🗃 GraphRAG Node\nNeo4j 2-hop + CAG"]
    end

    FanOut --> MergeCheck{"Merger Fast-Path?\n1 expert, high confidence,\nno external context?"}

    MergeCheck -->|Yes ⚡| FastPath["Fast-Path merger\n(Skip Judge LLM)"]
    MergeCheck -->|No| MergerNode["⚖ Merger / Judge LLM\nPre-flight VRAM budget check\nProportional context compression"]

    FastPath --> ThinkCheck{"Complex query & thinking active?"}
    MergerNode --> ThinkCheck

    ThinkCheck -->|Yes| ThinkNode["💭 Thinking Node\n4-step CoT reasoning trace"]
    ThinkNode --> CriticCheck{"medical_consult / legal_advisor?"}
    ThinkCheck -->|No| CriticCheck

    CriticCheck -->|Yes| CriticNode["🔎 Critic Node\nAsync self-evaluation"]
    CriticCheck -->|No| SaveResults
    CriticNode --> SaveResults

    SaveResults["Post-pipeline saves\n- ChromaDB L1 response cache\n- Valkey Thompson success/fail scores\n- Kafka moe.ingest / moe.requests"]

    SaveResults --> ReturnResponse

2. Pipeline State (MoEState / AgentState)

The LangGraph state object passes through all pipeline nodes to retain execution history and metadata:

Field Type Description
input str Original user query
response_id str UUID for response tracking and feedback correlation
mode str Operation mode (default, code, concise, agent, agent_orchestrated, research, report, plan)
plan List[Dict] Execution steps: [{task, category, search_query?, mcp_tool?, metadata_filters?}]
complexity_level str trivial / moderate / complex (classified by the IMoE ONNX router)
expert_results List[str] Accumulated responses from active expert workers
expert_models_used List[str] ["model::category", ...] recorded for system metrics
web_research str Formatted web research hits with inline citations
cached_facts str Hard cache content retrieved on L1 cache hits
math_result str Deterministic SymPy computation output
mcp_result str Outputs from deterministic MCP precision tools
graph_context str Structured Neo4j query results (with optional [Procedural Requirements] block)
final_response str Final synthesized response from the merger/judge
reasoning_trace str Intermediate Chain-of-Thought trace generated by thinking_node
metadata_filters Dict Optional domain filters extracted by the planner for scoped database retrieval

3. Node Mechanics

3a. IMoE Gate (Pre-Pipeline)

  • Runs prompt-embedding and semantic distance matching in ChromaDB.
  • Fallback ONNX model classifies query into domains, complexity, and retrieval needs in < 5ms.
  • Selects models dynamically using Thompson Sampling and applies VRAM-safe context limits.

3b. Planner Node

  • Only active for moderate and complex requests.
  • Invokes the Judge LLM to construct a structured task list.
  • Extracts domain filters (metadata_filters) to query target databases selectively.

3c. Expert Worker Node

  • Executes tasks in parallel.
  • Automatically escalates from T1 to T2 models if confidence threshold is not met.

3d. Merger / Judge Node

  • Enforces a PRE-FLIGHT context check to calculate prompt tokens against the Judge's absolute model limit.
  • If prompt exceeds limits, context is compressed proportionally (compress_prompt_to_fit).
  • Combines expert findings and outputs the final stream.