Multi-Model Mode

Use multiple LLMs in a single request. Reasoning model thinks, tool model executes, synthesis model responds.


Why multiple models?

Different models are good at different things. Opus is great at reasoning but expensive. Haiku is fast but shallow. GPT-4o handles tools well. o1 does extended thinking.

Multi-model mode picks the right model for each phase of a request instead of making one model do everything.


The four roles

Role
What it does
Default model

Reasoning

Analyzes the problem, plans approach, extended thinking

Claude Opus, o1

Tool Execution

Calls MCP tools, executes functions

Claude Sonnet, GPT-4o

Synthesis

Combines results into final response

Claude Sonnet, GPT-4o

Fallback

Error recovery when something fails

Gemini Flash, Haiku


How it decides

Not every request needs multi-model. The system analyzes:

  1. Query complexity - Simple questions go to a single model

  2. Tool requirements - If tools are likely needed, tool model gets involved

  3. Slider position - Higher slider = more likely to use multi-model

  4. Trigger patterns - Certain keywords ("analyze", "investigate", "debug") trigger it


Handoffs

Models pass context to each other through handoffs. The reasoning model's analysis becomes context for the tool model. Tool results become context for synthesis.

Max handoffs are configurable (default: 5) to prevent infinite loops.


Configuration

Environment variables

Runtime toggle

Admins can enable/disable via the Admin Portal under Pipeline Settings.

Via SDK


Cost tracking

Each role tracks its own costs. The total is reported in the response metadata:


When to use it

Good for:

  • Complex analysis tasks

  • Multi-step tool workflows

  • Architecture and debugging questions

  • Anything where you'd want to "think then do"

Skip it for:

  • Simple Q&A

  • Single-tool operations

  • High-volume, low-complexity tasks


Limitations

  • Adds latency (multiple model calls)

  • Higher cost for complex requests

  • Handoff context has size limits

  • Not all providers support extended thinking

Last updated