Author: Alden Do Rosario
Founder of: Custom GPT.ai
Last updated: March 15, 2026
- Run their first test in under 1 hour (first query sent)
- Deploy with early validation in under 2 weeks (100 queries sent)
- Fully validate their use case within 3 months (1,000 queries sent)
The Demo That Launched a Thousand Re-Do’s
You’ve seen the demo. We all have. Someone spins up a vector database, connects it to an LLM, and in 20 minutes they’re asking questions about their company’s documents. The CEO watches. The board watches. Everyone’s impressed. “Vector DB + LLM = Done!” That equation has launched more failed AI projects than I can count. Here’s what nobody mentions during that demo: the system only works because someone carefully selected 50 clean PDF files. The questions were rehearsed. The edge cases were avoided. And there’s no actual user trying to break it. The demo is the tip of the iceberg. The other 90% is underwater.What’s Actually Under the Surface (Of a Successful AI)
When we started building CustomGPT, I thought the hard part would be the AI. I was wrong. We spend 40% of our engineering time on data ingestion, getting documents into the system reliably. Every file format has quirks. PDFs with embedded images. Scanned documents that need OCR. YouTube videos and their transcriptions. The little tricks that enable RAG to search excel sheets. Auto-sync for web pages that change daily. There’s a reason some frameworks have five different PDF parsers. Nobody knows which one to use when. Under a build approach, data ingestion design decisions fall into your lap-and each one can break your system before data reaches your database.
Hallucinations aren’t a prompt engineering problem. Early on, I thought we could fix hallucinations by tweaking the system prompt. “Just tell it not to make things up.”
It doesn’t work that way.
Controlling hallucinations requires measures at every step of the pipeline-from how you chunk documents, to how you rank retrieved results, to how you construct the final prompt, to how you validate the response. We’ve spent thousands of engineering hours on this. And we’re still improving it.
Real users don’t query like developers. Your test queries are well-formed questions. Real users type things like:
- “Hmm – ok”
- “yeah, tell me more”
- “2”
- “that one”
- “ok – yeah”
- Data security – Ingestion, storage, deletion, audit logs
- Chat security – Handling NSFW queries, jailbreaking attempts, prompt injection
- Access security – SSO integration, role-based permissions, team management
2026: The Iceberg Got Deeper
Here’s what’s changed since the early RAG tutorials: simple RAG is now just one tool in a much larger toolkit. The AI agent landscape in 2026 looks nothing like 2023. If you’re planning to build, you’re not just building a RAG pipeline anymore. You’re building an orchestration layer for autonomous agents.
MCP (Model Context Protocol) is becoming the standard for connecting AI agents to tools and data sources. Anthropic open-sourced it, and it’s now under the Linux Foundation. Think of it as “USB-C for AI”-a standardized way to plug agents into databases, APIs, and enterprise systems. If you’re building from scratch, you’re now competing with an emerging standard.
Multi-agent orchestration is replacing single-purpose agents. Gartner reported a 1,445% surge in multi-agent system inquiries from 2024 to 2025. The architecture has shifted from “one smart agent” to “teams of specialized agents coordinated by an orchestration layer.” Building this yourself means building the coordination, the routing, the fallback logic, and the observability across all of them.
GraphRAG has emerged for complex queries. Traditional RAG finds semantically similar text. GraphRAG understands relationships between entities-enabling multi-hop reasoning that simple vector search can’t do. It’s powerful, but knowledge graph extraction costs 3-5x more than baseline RAG and requires domain-specific tuning.
Agent safety is now a recognized discipline. When agents can browse the web, access files, and take autonomous action, the attack surface expands dramatically. Prompt injection, where malicious instructions hide in web content or documents, is a real threat. Even Anthropic acknowledges that “securing AI agents’ real-world actions is still an active area of development.”
Response verification has become non-negotiable for enterprise. It’s no longer enough for AI to give an answer—you need to prove that answer is grounded in your actual sources. This means extracting every factual claim, cross-referencing it against source documents, and calculating a “verified claims score.” If your AI makes 10 claims and only 8 trace back to your knowledge base, stakeholders need to know. Legal needs to know. Compliance needs to know. Building this verification layer from scratch—with audit trails, multi-stakeholder risk assessment, and source citations—is its own engineering project on top of everything else.
True enterprise search is now the baseline expectation. Users don’t just want answers from a few documents—they expect AI that can respond to any query, on any size knowledge base, across any variety of data formats. Text, PDFs, spreadsheets, images, videos, web pages—all searchable, all connected. AI that finds and processes information exactly like a human would, but orders of magnitude faster. Building this means handling not just semantic search, but structured data queries, multi-hop reasoning across sources, and graceful handling of knowledge gaps.
The iceberg didn’t shrink as the technology matured. It got deeper.