LangGraph Pipeline & Execution Lifecycle¶
The MoE Sovereign pipeline is built on LangGraph to manage the entire request lifecycle. It routes tasks dynamically, executes specialists in parallel, performs contextual RAG searches, and synthesizes final answers.
1. Request Lifecycle Flowchart¶
With the integration of the IMoE Gating Network (June 2026), the request lifecycle is split into two phases: 1. Gate Phase (dynamic template compilation) and 2. Execution Phase (LangGraph execution).
flowchart TD
Start([Client Request]) --> GateEmbed["Local Embedding\n(all-MiniLM-L6-v2)"]
GateEmbed --> GateCache{"ChromaDB Template Cache\ncosine distance < 0.18?"}
GateCache -->|Hit 🎯| GateReady["Apply Cached Template"]
GateCache -->|Miss| GateONNX["⚡ Sovereign Router ONNX\nClassifier < 5ms CPU"]
GateONNX --> GateAlloc["🔀 Dynamic Allocator\n(Thompson Sampling + VRAM clamping)"]
GateAlloc --> GateReady
GateReady --> PipeCheck{"L1 Response Cache Hit?\nChromaDB cosine < 0.15?"}
PipeCheck -->|Yes ⚡| ReturnResponse([SSE Response Stream])
PipeCheck -->|No| PlanCacheCheck{"L2 Plan Cache Hit?\nValkey SHA256 plan key"}
PlanCacheCheck -->|Yes| FanOut
PlanCacheCheck -->|No| PlannerNode["🧠 Planner Node\nJudge LLM task decomposition"]
PlannerNode --> ValkeyWrite["Write plan to Valkey (TTL 30 min)"]
ValkeyWrite --> FanOut
subgraph FanOut ["Parallel execution fan-out"]
direction LR
Workers["👥 Expert Workers\nT1 + T2 confidence-gated"]
Research["🌐 SearXNG Research"]
Math["∑ Math (SymPy)"]
MCP["🔧 MCP Node\n16 deterministic tools"]
GraphRAG["🗃 GraphRAG Node\nNeo4j 2-hop + CAG"]
end
FanOut --> MergeCheck{"Merger Fast-Path?\n1 expert, high confidence,\nno external context?"}
MergeCheck -->|Yes ⚡| FastPath["Fast-Path merger\n(Skip Judge LLM)"]
MergeCheck -->|No| MergerNode["⚖ Merger / Judge LLM\nPre-flight VRAM budget check\nProportional context compression"]
FastPath --> ThinkCheck{"Complex query & thinking active?"}
MergerNode --> ThinkCheck
ThinkCheck -->|Yes| ThinkNode["💭 Thinking Node\n4-step CoT reasoning trace"]
ThinkNode --> CriticCheck{"medical_consult / legal_advisor?"}
ThinkCheck -->|No| CriticCheck
CriticCheck -->|Yes| CriticNode["🔎 Critic Node\nAsync self-evaluation"]
CriticCheck -->|No| SaveResults
CriticNode --> SaveResults
SaveResults["Post-pipeline saves\n- ChromaDB L1 response cache\n- Valkey Thompson success/fail scores\n- Kafka moe.ingest / moe.requests"]
SaveResults --> ReturnResponse
2. Pipeline State (MoEState / AgentState)¶
The LangGraph state object passes through all pipeline nodes to retain execution history and metadata:
| Field | Type | Description |
|---|---|---|
input |
str |
Original user query |
response_id |
str |
UUID for response tracking and feedback correlation |
mode |
str |
Operation mode (default, code, concise, agent, agent_orchestrated, research, report, plan) |
plan |
List[Dict] |
Execution steps: [{task, category, search_query?, mcp_tool?, metadata_filters?}] |
complexity_level |
str |
trivial / moderate / complex (classified by the IMoE ONNX router) |
expert_results |
List[str] |
Accumulated responses from active expert workers |
expert_models_used |
List[str] |
["model::category", ...] recorded for system metrics |
web_research |
str |
Formatted web research hits with inline citations |
cached_facts |
str |
Hard cache content retrieved on L1 cache hits |
math_result |
str |
Deterministic SymPy computation output |
mcp_result |
str |
Outputs from deterministic MCP precision tools |
graph_context |
str |
Structured Neo4j query results (with optional [Procedural Requirements] block) |
final_response |
str |
Final synthesized response from the merger/judge |
reasoning_trace |
str |
Intermediate Chain-of-Thought trace generated by thinking_node |
metadata_filters |
Dict |
Optional domain filters extracted by the planner for scoped database retrieval |
3. Node Mechanics¶
3a. IMoE Gate (Pre-Pipeline)¶
- Runs prompt-embedding and semantic distance matching in ChromaDB.
- Fallback ONNX model classifies query into domains, complexity, and retrieval needs in
< 5ms. - Selects models dynamically using Thompson Sampling and applies VRAM-safe context limits.
3b. Planner Node¶
- Only active for
moderateandcomplexrequests. - Invokes the Judge LLM to construct a structured task list.
- Extracts domain filters (
metadata_filters) to query target databases selectively.
3c. Expert Worker Node¶
- Executes tasks in parallel.
- Automatically escalates from T1 to T2 models if confidence threshold is not met.
3d. Merger / Judge Node¶
- Enforces a PRE-FLIGHT context check to calculate prompt tokens against the Judge's absolute model limit.
- If prompt exceeds limits, context is compressed proportionally (
compress_prompt_to_fit). - Combines expert findings and outputs the final stream.