Skip to content

Quickstart — MoE Sovereign

What is MoE Sovereign?

A self-hosted Multi-Model LLM-System running on dedicated GPU hardware. Incoming requests are analyzed, distributed to specialized LLM experts, calculation tools, and a knowledge base, structurally analyzed by a reasoning model, and synthesized by a judge LLM.

OpenAI API compatible — drop-in replacement for Open WebUI and other clients.


Services

Container Port Function
langgraph-orchestrator 8002 Core API (OpenAI-compatible)
moe-admin-ui 8088 Web Admin: configure experts, models, prompts
mcp-precision 8003 20 precision tools (math, date, network, German law, ...)
neo4j-knowledge 7474 / 7687 Knowledge graph (GraphRAG)
terra_cache 6379 Valkey: checkpoints, performance scores, metadata
chromadb-vector 8001 Vector cache (semantic cache)
moe-kafka 9092 Event streaming (ingest, audit log, feedback)

Port collisions? Every host port in the table can be remapped via .env (e.g. ADMIN_UI_HOST_PORT=8089) — see Deployment → Docker Compose for the full list. macOS users should run bash scripts/bootstrap-macos.sh instead of install.sh; details in Deployment → macOS.


Pipeline

flowchart TD
    REQ["📨 Request"] --> CACHE["🔍 Cache Check\n(ChromaDB)"]
    CACHE -->|"Hit"| RESP["✅ Response"]
    CACHE -->|"Miss"| PLANNER["🧠 Planner\n(Judge LLM)"]

    PLANNER --> E1["👥 Expert LLMs\n(Two-Tier)"]
    PLANNER --> E2["🌐 Web\n(SearXNG + Citations)"]
    PLANNER --> E3["🔧 MCP Tools\n(20 Tools)"]
    PLANNER --> E4["∑ SymPy\nMathematics"]
    PLANNER --> E5["🗃 Neo4j\nGraphRAG"]

    E1 -->|"Low confidence"| THINKING["💭 Thinking Node\n(CoT, conditional)"]
    E1 & E2 & E3 & E4 & E5 --> MERGER["⚖ Merger\n(Judge LLM)"]
    THINKING --> MERGER

    MERGER --> CRITIC["🔎 Critic\n(fact check, medical/legal)"]
    CRITIC --> RESP

    RESP --> S1[("ChromaDB\nCache")]
    RESP --> S2[("Kafka\n→ Neo4j Ingest")]
    RESP --> S3[("Valkey\nMetadata")]

Output Modes

Multiple model IDs for Open WebUI — selectable via the model field:

Model Mode
moe-orchestrator Full answers with explanations (default)
moe-orchestrator-code Source code only — no explanations
moe-orchestrator-concise Short & precise — max 120 words
moe-orchestrator-agent Coding agent (OpenCode, Continue.dev)
moe-orchestrator-agent-orchestrated Claude Code — full MoE fanout
moe-orchestrator-research In-depth research with private SearXNG search
moe-orchestrator-report Structured report with sections and citations
moe-orchestrator-plan Structured planning for complex tasks

Quick Start for Claude Code Users

Step 1: Configure .bashrc

# ~/.bashrc or ~/.zshrc

# Use MoE API as Anthropic backend
export ANTHROPIC_BASE_URL=http://localhost:8002
export ANTHROPIC_API_KEY=moe-sk-xxxxxxxxxxxxxxxx...

Then: source ~/.bashrc

Step 2: Start Claude Code

# Option A — per-session flag
claude --model moe-orchestrator-agent-orchestrated \
       --api-key $ANTHROPIC_API_KEY \
       --base-url $ANTHROPIC_BASE_URL/v1

# Option B — persistent in ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8002/v1",
    "ANTHROPIC_API_KEY": "moe-sk-xxxxxxxx..."
  }
}

Step 3: Check status

curl http://localhost:8002/v1/models

Available Claude Code Skills

Skill Description
/moe Direct query to the local MoE system (all modes available)
/law Retrieve and interpret German federal law
/calc Precise calculations via MCP tools (no LLM)
/research Private web research via local SearXNG instance
/local-doc Generate code documentation with local LLM
/local-review Code review via local MoE system
/explain-error Error analysis with technical support expert
/moe-status Status of all services, models, and GPU utilization

Quick Start for API Users

Deployment

For a fresh Debian or Ubuntu server, the recommended approach is the one-line installer:

curl -sSL https://moe-sovereign.org/install.sh | bash

The installer handles container runtime setup (Docker CE or Podman), directory creation, configuration, and deployment automatically. It supports Debian 11–13 and Ubuntu 22.04–26.04. On any other Linux distribution, install Docker or Podman manually and run docker compose up -d directly — the stack itself has no OS dependencies.

See Installation for details and the First-Time Setup guide for the post-install wizard.

For manual deployment:

# 1. Create configuration
cp .env.example .env
# Fill in required values — then run the Setup Wizard in the Admin UI
# to configure INFERENCE_SERVERS and core models

# 2. Start all services
sudo docker compose up -d

# 3. Check status
curl http://localhost:8002/v1/models
curl http://localhost:8002/graph/stats

Endpoint: http://<host>:8002/v1

Chat (simple)

curl http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moe-orchestrator",
    "messages": [{"role": "user", "content": "Your question"}],
    "stream": false
  }'

Chat (Streaming / SSE)

curl http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moe-orchestrator",
    "messages": [{"role": "user", "content": "Your question"}],
    "stream": true
  }'

Feedback (learning loop)

curl http://localhost:8002/v1/feedback \
  -H "Content-Type: application/json" \
  -d '{"response_id": "chatcmpl-<id>", "rating": 5}'

Rating 1–2 = negative, 3 = neutral, 4–5 = positive. The response_id is in the id field of each chat response.

Graph API

curl http://localhost:8002/graph/stats
curl "http://localhost:8002/graph/search?q=Ibuprofen"

OpenAI-compatible clients (Continue.dev, Open WebUI, curl)

# Chat completion (streaming)
curl -s http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moe-orchestrator",
    "stream": true,
    "messages": [{"role": "user", "content": "Explain Transformer architectures."}]
  }'

# List available model IDs
curl -s http://localhost:8002/v1/models | jq '.data[].id'