AI Model Advisor
Pick the right model for the job
Compare strengths, pros and cons across leading LLMs — or describe your use case and let the advisor recommend the best fit.
Ask the advisor
Priorities
Filter
9 modelsAnthropic
Claude Haiku 4.5
Economy
200K ctx$1 in / 1M$5 out / 1M
Latency
Cost
Tool use
Best for
- High-volume classification
- Routing & extraction
- Customer support triage
Avoid for
- Deep multi-step reasoning
- Long-form creative writing
Pros
- Very fast
- Cheap per token
- Strong instruction following
- Vision support
Cons
- Weaker on complex math
- Smaller knowledge depth than Opus
Fit by use case
RAG over docs — Fast retrieval-augmented answers at scale
Agentic workflows — Good for simple agents; escalate hard steps
Code generation — Snippets yes; large refactors no
Legal/medical analysis — Use Sonnet/Opus for accuracy
Anthropic
Claude Sonnet 4.5
Balanced
200K ctx$3 in / 1M$15 out / 1M
Reasoning
Coding
Reliability
Best for
- Production agents
- Code generation
- Document analysis
- Structured extraction
Avoid for
- Throwaway classification (use Haiku)
- Image generation
Pros
- Best price/quality balance
- Strong reasoning
- Reliable tool calls
- Long context
Cons
- Slower than Haiku
- Costlier than Gemini Flash
Fit by use case
RAG over docs — Top-tier extraction quality
Agentic workflows — Default choice for prod agents
Code generation — Excellent multi-file edits
Legal/medical analysis — Strong accuracy + citations
Anthropic
Claude Opus 4.6
Premium
200K ctx$5 in / 1M$25 out / 1M
Reasoning depth
Nuance
Best for
- Frontier research tasks
- Complex planning
- High-stakes analysis
Avoid for
- High-volume cheap workloads
- Realtime UX
Pros
- Best Anthropic reasoning
- Nuanced writing
- Deep analysis
Cons
- Expensive
- Slower
Fit by use case
RAG over docs — Overkill unless docs are very complex
Agentic workflows — Best for long-horizon planning
Code generation — Top quality, watch cost
Legal/medical analysis — Highest accuracy tier
OpenAI
GPT-5
Premium
400K ctx$2.5 in / 1M$10 out / 1M
Multimodal
Tool use
Reasoning
Best for
- Multimodal reasoning
- Tool-heavy agents
- Vision + text
Avoid for
- Cost-sensitive bulk processing
Pros
- Strong all-rounder
- Excellent tool use
- Native multimodal
Cons
- Pricier than Gemini
- Variable latency
Fit by use case
RAG over docs — Excellent extraction + reasoning
Agentic workflows — Best-in-class tool calling
Code generation — Strong, especially with reasoning effort
Legal/medical analysis — High accuracy with citations
OpenAI
GPT-5 Mini
Balanced
400K ctx$0.6 in / 1M$2.4 out / 1M
Cost
Speed
Multimodal
Best for
- Cost-aware production
- Chatbots
- Mid-complexity agents
Avoid for
- Frontier reasoning tasks
Pros
- Great $/quality ratio
- Fast
- Multimodal
Cons
- Less nuanced than GPT-5
Fit by use case
RAG over docs — Sweet spot for production RAG
Agentic workflows — Good for simpler agents
Code generation — Decent; GPT-5 for hard problems
Legal/medical analysis — Verify with human reviewer
Google
Gemini 2.5 Pro
Premium
2M ctx$1.25 in / 1M$5 out / 1M
Context size
Multimodal
Cost
Best for
- Massive context tasks
- Video + audio understanding
- Repo-wide code analysis
Avoid for
- Workloads needing strict EU residency (use Mistral)
Pros
- Huge 2M context
- Cheapest premium tier
- Native multimodal incl. video
Cons
- Tool calling less battle-tested than OpenAI
Fit by use case
RAG over docs — Can skip chunking for many use cases
Agentic workflows — Catching up; OpenAI/Anthropic lead
Code generation — Excellent for repo-scale edits
Legal/medical analysis — Long contracts fit in one prompt
Google
Gemini 2.5 Flash
Balanced
1M ctx$0.3 in / 1M$1.2 out / 1M
Cost
Latency
Context
Best for
- High-volume multimodal
- Long-context summarization
- Live chat
Avoid for
- Hardest reasoning tasks
Pros
- Very cheap
- Fast
- 1M context
- Multimodal
Cons
- Less nuance than Pro
Fit by use case
RAG over docs — Default cost-effective RAG model
Agentic workflows — Fine for simple chains
Code generation — OK for snippets
Legal/medical analysis — Combine with strict prompts
Meta
Llama 3.3 70B
Balanced
128K ctx$0.6 in / 1M$0.9 out / 1M
Privacy
Customization
Cost at scale
Best for
- On-prem / VPC deployment
- Data-sensitive workloads
- Fine-tuning
Avoid for
- Plug-and-play vision tasks
Pros
- Open weights
- Run anywhere
- Cheap output
- Fine-tunable
Cons
- Self-hosting complexity
- Weaker than frontier models on hard tasks
Fit by use case
RAG over docs — Great when data can't leave VPC
Agentic workflows — Use 405B for complex agents
Code generation — Reasonable; not SOTA
Legal/medical analysis — Keep PHI/PII on-prem
Mistral
Mistral Large 2
Premium
128K ctx$2 in / 1M$6 out / 1M
EU compliance
Function calling
Best for
- EU data residency
- Function calling
- Multilingual EU languages
Avoid for
- Vision tasks
- Massive 1M+ context needs
Pros
- EU-hosted option
- Strong function calling
- Good multilingual
Cons
- Smaller ecosystem
- No native vision
Fit by use case
RAG over docs — Strong EU-resident RAG choice
Agentic workflows — Function calling is a strength
Code generation — Use Codestral for code-specific
Legal/medical analysis — Pairs well with GDPR needs
