Multi-Model Mode
Use multiple LLMs in a single request. Reasoning model thinks, tool model executes, synthesis model responds.
Why multiple models?
Different models are good at different things. Opus is great at reasoning but expensive. Haiku is fast but shallow. GPT-4o handles tools well. o1 does extended thinking.
Multi-model mode picks the right model for each phase of a request instead of making one model do everything.
The four roles
Reasoning
Analyzes the problem, plans approach, extended thinking
Claude Opus, o1
Tool Execution
Calls MCP tools, executes functions
Claude Sonnet, GPT-4o
Synthesis
Combines results into final response
Claude Sonnet, GPT-4o
Fallback
Error recovery when something fails
Gemini Flash, Haiku
How it decides
Not every request needs multi-model. The system analyzes:
Query complexity - Simple questions go to a single model
Tool requirements - If tools are likely needed, tool model gets involved
Slider position - Higher slider = more likely to use multi-model
Trigger patterns - Certain keywords ("analyze", "investigate", "debug") trigger it
Handoffs
Models pass context to each other through handoffs. The reasoning model's analysis becomes context for the tool model. Tool results become context for synthesis.
Max handoffs are configurable (default: 5) to prevent infinite loops.
Configuration
Environment variables
Runtime toggle
Admins can enable/disable via the Admin Portal under Pipeline Settings.
Via SDK
Cost tracking
Each role tracks its own costs. The total is reported in the response metadata:
When to use it
Good for:
Complex analysis tasks
Multi-step tool workflows
Architecture and debugging questions
Anything where you'd want to "think then do"
Skip it for:
Simple Q&A
Single-tool operations
High-volume, low-complexity tasks
Limitations
Adds latency (multiple model calls)
Higher cost for complex requests
Handoff context has size limits
Not all providers support extended thinking
Last updated