Can’t decide the correct AI model? Start with the smallest option that meets your accuracy and context needs. Use Fastest Responses for speed/cost, then move up to Highest Relevance or Complex Reasoning when you need stronger retrieval, re-ranking, or deeper reasoning. Enterprise teams can explicitly pick advanced models per agent.
What “AI model choice” means
Capability tiers (speed ↔ depth)
- Fastest Responses uses GPT-4o mini by default—built for snappy replies and high throughput. Great for live chat and high-volume support.
- Highest Relevance adds advanced re-ranking to improve how the agent selects context from your data, boosting answer quality on content-heavy queries.
- Complex Reasoning breaks complex prompts into sub-queries and fuses results, improving structured, multi-step answers (with ~1–2s extra latency).
Plan-based model control
- Standard: default GPT-4.1 (balanced accuracy/speed).
- Premium: you can switch modes (Fastest Responses / Highest Relevance / Complex Reasoning). Specific model picking is limited; modes map to curated defaults (e.g., GPT-4o mini for Fastest Responses).
- Enterprise: full model selection per agent, including GPT-4.1, GPT-4o, GPT-5 (chat), GPT-4o mini / 4.1 mini, Claude 4.5 Sonnet, Claude 4 Sonnet, Claude 3.5 Haiku. Use Fastest Responses for speed, Highest Relevance for retrieval quality, and Complex Reasoning for hardest tasks.
Context and multimodality
Longer-context and multimodal capabilities are available via these higher-tier models/modes; pick them when you need larger document analysis, richer reasoning, or image handling. (Exact limits vary by model; choose per agent in Enterprise.)
Latency, throughput, and rate limits
“Mini” defaults maximize responsiveness and volume; Complex Reasoning improves answer depth with slight added latency. Highest Relevance improves retrieval quality without changing your data.
Why model choice matters
- Accuracy vs cost. If the stakes are high (legal, medical, compliance), prefer Highest Relevance or Complex Reasoning on advanced models; otherwise use Fastest Responses for routine chats.
- Handling long documents & retrieval. Highest Relevance re-ranks retrieved snippets for better answers on big corpora; Complex Reasoning decomposes and synthesizes multi-part questions.
- Compliance & availability. Enterprise lets you pick models per agent (incl. non-OpenAI options), which can help align with vendor/compliance needs.
Important: If you enable My Data + LLM (general knowledge), accuracy can drop and hallucination risk increases. Stick to My Data Only unless you explicitly need broader coverage.
How to choose (decision flow)
- Start small: Launch with Fastest Responses (GPT-4o mini) to set a baseline on latency, cost, and CSAT.
- Set thresholds: e.g., answer accuracy ≥90%, average latency <3s. (Track in Agent Analytics.)
- Escalate intentionally:
- Need better retrieval on large docs? Highest Relevance.
- Need multi-step, deeper reasoning? Complex Reasoning.
- On Enterprise, assign GPT-4.1/4o/GPT-5/Claude per agent based on use case.
- Evaluate on your data: A/B test modes/models using real chat logs.
- Iterate & document: Re-check monthly; settings and models evolve.
How to choose & set the model in CustomGPT.ai
- View available options
In Personalize → AI Intelligence, you’ll see the available modes (all plans) and model selectors (Enterprise). - Select or switch
- Standard: default GPT-4.1.
- Premium: pick Fastest Responses / Highest Relevance / Complex Reasoning.
- Enterprise: pick specific models per agent (e.g., GPT-4o mini for speed, GPT-4.1/4o/Claude 4.5 for depth, GPT-5 Chat where supported).
- Optimize with built-ins
- Fastest Responses: enable for speed-critical flows.
- Highest Relevance: enable for large/complex corpora (advanced re-ranking).
- Complex Reasoning: enable for multi-part problems (adds ~1–2s).
- Response source: prefer My Data Only; use My Data + LLM only when you accept the trade-offs.
- Logged-in user awareness: personalize responses using the user’s name (on by default for new agents).
Tip: For content-heavy or RAG-first bots, try Highest Relevance before jumping models; you may get a big quality boost with minimal cost/latency changes.
Example — Support chatbot with long PDFs and code snippets
A support team starts with Fastest Responses (GPT-4o mini) for FAQs and troubleshooting. As tickets include multi-page docs and stepwise analysis, they switch on Highest Relevance for better retrieval. For deep issue triage, they enable Complex Reasoning or (Enterprise) assign an advanced model per agent.
Conclusion
Picking the right setup is a balancing act between speed, cost, and intelligence. Start lean, measure, then scale up settings (and, on Enterprise, models) only when your data and KPIs demand it. Ready to test the fit with a trial? Open your agent’s AI Intelligence tab and try these modes on real chats.