Sovereign MoE – Documentation¶
Self-hosted Multi-Model Orchestrator — Routes requests to specialized local LLMs, enriches context via Neo4j Knowledge Graph and web search, and synthesizes results with a Judge LLM. OpenAI-compatible API endpoint — works with Claude Code, Continue.dev, and any OpenAI-compatible client.
Quick Navigation¶
| Section | Pages | Description |
|---|---|---|
| Installation | Installation · First-Time Setup | Install on Debian, deploy the stack, run the Setup Wizard |
| User Handbook | Quick Start · Handbook · API | Getting started, modes, skills, vision, API usage |
| Admin Backend | Overview | Manage users, budgets, templates, profiles |
| Federation | Overview | MoE Libris -- federated knowledge exchange between nodes |
| User Portal | Overview | Self-service for end users: usage, keys, billing |
| Intelligence | Agentic Loop · 7B Ensemble · Causal Learning | Agentic re-planning, ensemble benchmarks, knowledge accumulation |
| Reference | Authentication · Expert Prompts · Import/Export | API reference, system prompts, schemas |
| FAQ | FAQ | Common questions about Claude Code, API, troubleshooting |
| Changelog | Changelog | Version history of all releases |
Service Overview¶
| Service | URL | Purpose |
|---|---|---|
| Orchestrator API | http://localhost:8002/v1 |
Main endpoint (OpenAI-compatible) |
| Admin UI | http://localhost:8088 |
Configuration & monitoring |
| User Portal | http://localhost:8088/user/dashboard |
End-user interface |
| Log Viewer (Dozzle) | https://logs.moe-sovereign.org |
Browser-based container log viewer |
| Grafana | http://localhost:3001 |
Metrics dashboards |
| Prometheus | http://localhost:9090 |
Raw metrics |
| Neo4j Browser | http://localhost:7474 |
Knowledge graph explorer |
| MCP Server | http://localhost:8003 |
Precision tools |
7B Ensemble — GPT-4o Class Performance, Self-Hosted¶
Benchmark result (April 2026): 8 domain-specialist 7–9B models on legacy Tesla M10 GPUs achieve 6.11 / 10 on MoE-Eval — the same score class as GPT-4o mini — with zero data leaving the cluster. Three consecutive overnight epochs, 36 scenarios, 0 failures.
| Single 7B | 8× 7B Ensemble | 30B+14B Orchestrated | H200 Cloud (120B) | |
|---|---|---|---|---|
| MoE-Eval Score | 3.3–3.6 / 10 | 6.11 / 10 | 7.60 / 10 | 9.00 / 10 |
| VRAM required | 8 GB | 88 GB (distributed) | 80 GB RTX cluster | H200 GPU |
| Data sovereignty | ✅ | ✅ | ✅ | ❌ Cloud |
| Per-token cost | €0 | €0 | €0 | Metered |
The key insight: specialisation beats scale. A meditron:7b handles medical QA better than
a general 14B model; mathstral:7b outperforms general models on MATH tasks; qwen2.5-coder:7b
leads SWE-bench in its class. Routing each sub-task to its specialist model compounds these
advantages without requiring any single model to be large enough to cover all domains.
→ Full benchmark report and LLM comparison
What's New (May 2026)¶
MoE Codex — Compliance-Grade Data Intelligence¶
Catalog, Approval Workflow, Explorer, Drift Detection, OpenLineage, lakeFS Versioning, NiFi ETL, and Notebook (JupyterLite) have moved to the dedicated moe-codex repository — the compliance-grade data intelligence platform for regulated industries.
Deploy moe-sovereign for sovereign LLM infrastructure. Add moe-codex for Foundry-inspired data governance features (catalog, lineage, approval workflows). See the Palantir Comparison page for an honest assessment of where the architectures converge and where the gap remains.
→ Full changelog entry for 2026-05-10
Agentic Re-Planning Loop¶
The orchestrator now autonomously detects gaps in its own synthesis and launches focused follow-up research rounds — without user intervention.
After each Judge synthesis, a lightweight gap-detector LLM call evaluates COMPLETION_STATUS: COMPLETE | NEEDS_MORE_INFO. When incomplete, the still-unresolved gap and all previously established facts are injected back into the Planner as structured context. The Planner then routes exclusively the missing piece to web_researcher or precision_tools — not the full question again. Up to 3 agentic iterations per request.
→ Agentic Re-Planning Loop — full architecture
PowerPoint Generation (MCP)¶
A new generate_pptx MCP tool creates fully formatted .pptx presentations from structured content (title, slides, bullet points, notes). The file is uploaded to MinIO and delivered as a signed download link directly in the chat response.
Selective Template & Profile Export¶
The Admin UI now supports checkbox selection on the Templates and CC Profiles pages. Export only the items you need — the API accepts an optional ?ids=a,b,c parameter. Exporting everything still works as before.
CLI Agents — Best Of¶
MoE Sovereign works with any OpenAI-compatible client, but execution-loop agents like Aider, Open Interpreter, and Continue.dev unlock the full capability stack: correction memory, semantic caching, domain-expert routing, and the Knowledge Graph all activate through their natural try → fail → fix loops.
| Page | What it covers |
|---|---|
| CLI Agents — Best Of | Plain-language explanation of why and how, Before/After comparison, connection examples for each tool |
| Architectural Deep Dive | Delta table, Mermaid data-flow diagrams, measured thresholds from the implementation |
Connecting with Claude Code¶
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8002/v1",
"ANTHROPIC_API_KEY": "moe-sk-..."
}
}
Alternatively: configure a profile in the Admin UI under Profiles and enable it.
Documentation Structure¶
graph LR
D[docs/]
D --> IDX[index.md<br/>this page]
D --> FAQ["faq.md<br/>Frequently asked questions<br/>(Claude Code, API, troubleshooting)"]
D --> CL[changelog.md<br/>Version history]
D --> G[guide/]
D --> A[admin/]
D --> P[portal/]
D --> R[reference/]
G --> GIDX[index.md<br/>User handbook – overview]
G --> GQ[quickstart.md<br/>Services, pipeline, getting started]
G --> GH[handout.md<br/>Complete user handbook]
G --> GA["api.md<br/>API access, keys, curl & SDK examples"]
A --> AIDX[index.md<br/>Admin backend documentation]
P --> PIDX[index.md<br/>User portal documentation]
R --> RA["auth.md<br/>Authentication (OIDC, API key)"]
R --> REP[expert-prompts.md<br/>System prompts for all expert roles]
R --> RI[import-export.md<br/>JSON schemas for templates and profiles]
Stack¶
| Component | Role |
|---|---|
| LangGraph | Pipeline orchestration |
| Ollama | Local LLM inference |
| ChromaDB | Semantic vector cache |
| Valkey | Checkpoints, budget counters, scoring |
| Neo4j 5 | Knowledge graph (GraphRAG) |
| Apache Kafka | Event streaming & async learning |
| Prometheus + Grafana | Metrics & dashboards |
| FastAPI + uvicorn | HTTP API layer |
| PostgreSQL | User database |