CustomGPT.ai Blog

WHY MCP? A developers Point of View

Written by: Priyansh Khodiyar

June 4, 2025

18 min read

The current image has no alternative text. The file name is: Retrieval-Augmented-Generation-RAG-The-Definitive-Guide-2025-3.png

You’ve probably noticed that today’s AI models can do amazing things—drafting emails, summarizing documents, even answering complex customer questions.

But behind the scenes, they often struggle when it comes to getting up-to-the-minute facts or diving into your company’s own private data.

That’s where the Model Control Protocol (MCP) comes in. In a recent online discussion, three industry practitioners showed how MCP can radically simplify the way AI applications tap into external tools and databases.

The Pain Point: “Why Can’t My AI Just Reach Out and Get the Facts?”

Imagine you’ve built a chatbot that helps employees check internal metrics—like headcount, project status, or sales figures.

Traditionally, your coding team would have to create a custom connector (a chunk of code) for each data source: one for SAP, another for the Salesforce API, one for Google Sheets, and so on.

If you had three applications (say, a chat interface, a voice assistant, and a mobile app) and three data services, you’d need nine connectors (3 × 3)—and every time you add a new data source, you multiply the work again.

But what if you could “plug in” any data service—Salesforce, GitHub, Stripe, Slack—using a universal interface, much like how USB ports let you plug in any keyboard or mouse?

That’s exactly what MCP aims to deliver: a standard way for AI models to talk to any external “tool” (API, database, or file system) without rebuilding a custom bridge each time.

“Models are only as good as the context provided to them.”
—Anthropic, in their original MCP announcement

In other words, if you give your AI up-to-date context (say, current inventory levels or the latest product documentation), it can make far better decisions—without requiring you to retrain the model every time your data changes.

What Is MCP? A High-Level Look

At its simplest, MCP consists of:

A Host Application (MCP Client)
This is your AI “front end” (a chat UI, a coding assistant, a voice bot). You embed a small MCP client library into it.
An MCP Server
For each external service or data store you want to tap, you run a little server that implements the MCP specification. Under the hood, that server calls the real service API (e.g., GitHub’s REST API,Google Maps) or returns data from your own database.
The MCP Protocol Itself
A standardized JSON-over-HTTP format that defines how to:
- Announce what “tools” (functions) the server offers
- Specify the input parameters
- Return structured results

Once your host application “speaks MCP,” it can connect to any of those servers with minimal extra work. Instead of N × M custom integrations (where N = number of applications, M = number of services), you need just N + M connections.

Three Core Primitives: Tools, Resources, and Prompts

The presenters highlighted that MCP isn’t just about “tool calling.” It actually defines three ways data and logic flow between a host application and MCP servers:

Tools (Model-Controlled)
These are functions that the AI model can decide to invoke on its own. Suppose a user asks, “Show me the sales report for Q1 in a chart.” The AI model recognizes it needs charting data and calls a get_sales_data tool on the MCP server, which returns raw numbers. Then the model can render or summarize them. Because tools are model-controlled, the AI decides when to call them.
Resources (Application-Controlled)
This is more like “give me some documents or records.” The MCP server might expose a resource called company_wiki_pages. The host application asks for those pages, decides which snippets are relevant, and feeds that text into the AI model as context. Here, the application (not the model) decides when and how to use those resources.
Prompts (User-Controlled)
In certain cases, the MCP server can offer prewritten prompt templates—say, “Generate Unit Tests” or “Create a Release Note.” The user picks one, the application sends that exact prompt to the AI model, and voilà, they get a tailored result. It’s like a dropdown of “canned” prompts supplied by the server for specific tasks.

Finally, an interesting twist: MCP allows the server to call back into the client—an ability called sampling. For example, instead of using a cloud-hosted LLM for a small text transformation, the server might say, “Hey client, could you run this micro-prompt locally and send me the output?” This keeps certain data on the user’s machine and avoids extra network hops.

Demo Time: Real-World Examples You Can Try Yourself

1. Live Route-Finding (Google Maps via Postman + CloudDesktop)

One presenter built a simple MCP server that wraps Google Maps’ “directions” API. Here’s how you could replicate it in a few minutes:

Use Postman’s MCP Generator.
Postman now includes an MCP code generator. If you search for “Postman MCP,” you’ll find a tutorial that shows how to point at any REST API (like Google Maps) and generate a starter MCP server in TypeScript or Python.
- Source: Postman MCP Documentation (search for “MCP” once you’re in the docs)
Customize the Server Logic.
The generated code will let you call every endpoint of the Google Maps API. Trim it down so that only one function, get_route, remains. That function takes an origin and destination, calls Google Maps behind the scenes, then returns a clean JSON list of step-by-step instructions.
Run the MCP Server Locally.
Start the server on, say, http://localhost:8000/mcp. It advertises its single tool, get_route, along with a brief description of “Compute directions between two locations.”
Connect via CloudDesktop (or any MCP-aware client).
In the CloudDesktop settings, there’s a “Developer” tab where you can paste the local MCP URL. Once you do, the AI model (running in CloudDesktop) sees there’s a “get_route” tool available.
Ask the Model for Directions.
Type: “Give me route directions between Miami and Tampa, and show them to me in Spanish.”
- The model automatically calls get_route(origin=”Miami”, destination=”Tampa”).
- The server sends back JSON like:

{
  "start": "Miami, FL",
  "end": "Tampa, FL",
  "steps": [
    "Head northwest on I-95 N",
    "Merge onto I-4 W toward Tampa",
    "Take exit 10 toward Downtown Tampa"

    // …more steps…
  ]
}

{
  "start": "Miami, FL",
  "end": "Tampa, FL",
  "steps": [
    "Head northwest on I-95 N",
    "Merge onto I-4 W toward Tampa",
    "Take exit 10 toward Downtown Tampa"

    // …more steps…
  ]
}

Then the model translates those steps into Spanish, e.g., “Dirígete hacia el noroeste por la I-95 N…”

Because the model is pulling live data, you’ll always get current traffic conditions or new highways—no stale knowledge baked into the model itself.

2. Managing AWS Servers

Another speaker shared a nifty use case from their teaching course: managing AWS EC2 instances via MCP. In the past, students had to learn commands like aws ec2 describe-instances –instance-ids i-1234567890abcdef0. But with MCP, you can abstract all that away:

Build a Small MCP Server that knows how to:
- List running instances (list_instances)
- Check a single instance’s status (get_instance_status(id))
- Start or stop an instance (start_instance(id), stop_instance(id))
Inject AWS Credentials Securely.
When the host application (e.g., an AI-powered IDE) first connects, it authenticates against the MCP server. The server stores a short-lived token representing a particular user’s AWS permissions (via an IAM Role or similar).
Ask the AI Model to Check Status.
In your IDE’s chatbot, type:

“Hey, is my staging server up and running?”
The LLM recognizes it needs the get_instance_status tool. It calls something like:

{

  "tool": "get_instance_status",

  "parameters": { "id": "i-staging123" }

}

{

  "tool": "get_instance_status",

  "parameters": { "id": "i-staging123" }

}

The MCP server comes back with:

{ "status": "running", "uptime": "72 hours" }

{ "status": "running", "uptime": "72 hours" }

The model returns:

“The staging server (i-staging123) is currently running and has been up for 72 hours. Would you like me to stop it?”
Execute Control Commands via Chat.
If you reply, “Yes, please shut it down,” the model calls stop_instance(id=”i-staging123″). The server coordinates with AWS, and you get a confirmation message:

“Staging server is now stopping. Please wait a moment.”

No more memorizing CLI flags or switching to a separate terminal window. Everything just works through a single chat interface.

Putting Your Private Data into the Loop: RAG + MCP + Pinecone

Many businesses have valuable content—product manuals, support tickets, training videos—scattered across different systems (Zendesk, Confluence, Box, YouTube). If your AI can’t reliably pull in that data on demand, it risks giving out-of-date or completely wrong answers. That’s where RAG (Retrieval-Augmented Generation) comes in: you index your documents in a vector database and query them at runtime.

In one of the demos, the team used Pinecone—a managed vector database service—to store embeddings of an entire company website (e.g., Vanguard Institutional’s public docs). Here’s how it worked:

Crawl and Index the Website.
A “no-code” pipeline extracts text from every page (HTML, PDFs, etc.), chunking the content into bite-sized passages.
Generate Vector Embeddings.
Each text chunk is turned into a high-dimensional vector (e.g., using OpenAI’s text-embedding-ada-002), then stored in Pinecone.
Expose a RAG Endpoint as an MCP Server.
The RAG server offers a single tool called query_knowledge_base(query: String), which:
- Sends your query (e.g., “What’s the current return policy?”) to Pinecone
- Retrieves the most relevant chunks
- Returns them as structured JSON
Call from Any MCP Client.
In your AI assistant—whether it’s Claude or ChatGPT or a custom client—you plug in the MCP endpoint URL, like:
Then you ask your model:

“What does our latest Vanguard investment guide say about risk management?”
The model invokes query_knowledge_base, gets the exact passage (e.g., “Risk management involves diversifying across asset classes…established in mid-2023”), and presents a concise answer.

Because the RAG index is rebuilt daily (or whenever you push new content), your AI never hallucinates outdated information.

Why MCP Succeeds (Where Previous Standards Struggled)

If you’ve worked in enterprise IT or developer tooling for a while, you may recall efforts like CORBA, SOAP, or various homegrown “integration platforms” that promised to unify everything—often with limited success. MCP is showing more immediate momentum for three main reasons:

Real, Working Servers on Day One
When MCP was first announced by Anthropic in November 2024, they didn’t just publish a dry specification. They shipped multiple reference servers (GitHub, Slack, etc.) that you can connect to and try out instantly. That’s a world away from standards that lived in a PDF for a year before anyone built a working prototype.
Brand Trust & Rapid Adoption
Major players—Anthropic, OpenAI, Pinecone—have publicly backed MCP. That gives developers confidence: if these vendors support it, it’s worth investing time to learn and integrate.
Built on Familiar Ideas
MCP leverages well-known concepts (HTTP + JSON, function schemas, API keys) rather than inventing an entirely new stack. Developers already know how to handle REST APIs, token-based auth, and JSON parsing. MCP’s extra layer simply sits on top of those patterns.

Pros and Cons to Keep in Mind

Pros (What You Gain)

Simplified Integrations: Once your app “speaks MCP,” it can talk to any new MCP server without extra code.
Live, Accurate Data: Because you’re making real API calls at runtime, you avoid stale knowledge in your models.
Rapid Prototyping: In minutes, you can spin up a local MCP server (via Postman’s code generator), connect it to your app, and test in real time.
Better User Experience: Non-technical folks can ask natural language questions—“How do I refund a customer?”—and the AI automatically invokes the right service (Stripe, Zendesk, or a custom database).

Cons (What to Watch Out For)

Probabilistic Tool Calling
Unlike a hardcoded API, MCP relies on the AI model to decide when to call a tool. If your prompt isn’t clear, the model might skip a needed call or invoke the wrong function. For example, if you ask, “Show me our revenue chart,” the model could misinterpret that as a request for a text summary (no chart) unless you explicitly say, “Use the generate_chart tool.” Over time, as LLMs get better, this gap will narrow, but right now it demands careful prompt engineering.
Authentication & Security
Any MCP server that performs sensitive actions (e.g., shutting down a production server) must enforce strict credentials. You need to configure your host application to authenticate users (via OAuth tokens, API keys, or your own SSO), then pass a short-lived token to the MCP server. Right now, the MCP spec is still evolving around standardizing OAuth flows, but the simplest approach is to treat your MCP endpoint like any other internal REST API: require HTTPS, check tokens on each request, and log every call.
Testing & Compatibility
Since the model decides how to format each tool request, even small server updates (changing a parameter name from user_id to uid) can break the chain. You’ll want automated tests that:
- Mock up typical user prompts (“Generate a sales report”)
- Confirm the model calls the right MCP function (get_sales_data) with correct JSON
- Verify the AI’s final answer matches expected results
Non-Technical Readiness
While developers can copy a JSON snippet into CloudDesktop or a code snippet into a script, truly non-technical business users may still find it a stretch. The moment when you can “paste an MCP URL into any chat widget” without any extra setup—that’s when the broader, non-technical crowd will embrace it. Right now, we’re probably “a few inches away,” as one speaker put it, until major chat platforms build out a “one-click MCP connect” for end users.

“Pinecone and RAG”: Putting It All Together

If you haven’t heard of Pinecone before, it’s a managed vector database service designed specifically for similarity search and RAG workflows. You feed Pinecone embedding vectors (e.g., from OpenAI’s text-embedding-ada-002), and it instantly finds the top N closest matches. That’s perfect when you want your AI to answer questions based on your private documents (product manuals, support tickets, or research papers).

Here’s how the RAG + MCP + Pinecone pattern works in real life:

Index Your Data
- You point a crawler or ingestion pipeline at all your sources (Google Docs, Confluence, PDF archives, YouTube transcripts).
- Each chunk of text becomes a vector and is pushed into Pinecone.
Expose a RAG MCP Server
- This server (running on your cloud or on premises, or a hosted MCP one) provides one tool: query_knowledge_base(query: String).
- Under the hood, it asks Pinecone for the most relevant vectors, retrieves those original text chunks, and returns them as structured JSON.
Connect from Any MCP Client
- In your AI assistant (whether it’s ChatGPT with an MCP plugin or an in-house chatbot), you paste the MCP endpoint URL.
- Ask: “What does our product guide say about troubleshooting printer connectivity?”
- The AI calls query_knowledge_base(query=”troubleshooting printer connectivity”).
- Pinecone returns, say, the top three passages explaining “Ensure the printer is on the same Wi-Fi network…check driver version…restart both devices.”
- Your model integrates those passages into a coherent reply:
  
  “According to our guide, first confirm your printer and computer share the same Wi-Fi network. Next, verify you have the latest driver (version 5.2.3). If that doesn’t help, try restarting both devices.”

Because Pinecone handles the heavy lifting of similarity search (even across millions of documents), you get rapid, relevant results. And because you rebuild your index daily (or on each content update), any new FAQs or policy changes are immediately available to the AI.

A Conversational Takeaway: Why This Matters to You

If you’re a developer, an IT leader, or anyone responsible for making AI “do real work,” MCP is worth trying today. Maybe you want to:

Automate support without risking hallucinations. Link your Zendesk ticket database to a RAG MCP server, ensuring agents always see accurate ticket histories.
Streamline DevOps. Let developers ask, “Is Build #452 passing on the main branch?” and have a CI server MCP endpoint return “Yes, 2 tests failed.” Then the AI can suggest fixes or link to the logs.
Build a knowledge-powered chatbot. Using Pinecone and a RAG server, you can ingest all your product documentation and let customers ask natural questions—“How do I reset my password if I’ve lost my phone?”—and always get the exact answer from your policy manuals.

For business users, the promise is that you won’t have to juggle multiple apps or drain IT’s time with custom integration requests. Once an MCP server is in place, you just “chat” or “ask” and the AI knows how to reach out for the data.

Wrapping Up: Where Do We Go from Here?

MCP is still young—many of us are watching how it evolves and how quickly major platforms (Slack, Stripe, GitHub) publish official MCP servers. But the early results are promising:

Simplified development workflows that keep AI models grounded in real data
Better user experiences (no more “Sorry, I don’t know that” or outdated answers)
Rapid prototyping where you can test new AI-powered features in hours, not weeks

Yes, there are challenges ahead—prompt design, security, compatibility testing—but these are the same kinds of hurdles we’ve faced (and overcome) whenever a new API standard emerges. If you’re already using Pinecone for RAG, or you’re a CloudDesktop or GitHub user, it’s easy to plug into MCP today and see for yourself how it shakes out.

Take a look at the attached slides for more architecture diagrams, code snippets, and live screenshots. And if you’re interested in trying MCP, start small: spin up a local server with Postman’s generator, connect it to a sandbox application, and ask your model to call your new tool. In a few minutes, you’ll see just how liberating “live, accurate context” can be.

Happy building!

Frequently Asked Questions

Do the systems I want to connect need to be MCP compliant?

No. In MCP, the server acts as the adapter between the model and the real system, so the underlying API, database, website, or file system does not need native MCP support. You connect the existing source through an MCP server that announces available tools and returns structured results. The Tokenizer used that pattern with a private regulatory database. Michael Juul Rugaard said, “Based on our huge database, which we have built up over the past three years, and in close cooperation with CustomGPT.ai, we have launched this amazing regulatory service, which both law firms and a wide range of industry professionals in our space will benefit greatly from.”

Can MCP connect AI to private company data safely?

Yes, but the protocol itself is only part of the answer. A safer setup exposes only the specific tools or resources your assistant needs instead of broad access to an entire backend. When evaluating an MCP implementation, look for scoped access, authentication, and platform controls such as SOC 2 Type 2 certification, GDPR compliance, and a stated policy that customer data is not used for model training.

What is the difference between MCP and RAG?

RAG retrieves relevant passages from documents or other knowledge sources so the model can answer from grounded content. MCP solves a different job: it gives the model a standard way to call external tools, APIs, databases, or file systems. Use RAG when the answer already exists in your content. Use MCP when the assistant needs live data or has to take an action. Many teams use both together. A published benchmark also showed CustomGPT.ai outperforming OpenAI in RAG accuracy, which highlights how important grounded retrieval is when the answer lives in documents.

How can I use an MCP server with Gemini or ChatGPT?

Treat your application as the MCP client. The model uses its native tool-calling flow, and your app connects that request to the MCP server through the protocol’s standardized interface. In practice, that means Gemini or ChatGPT can use the same back-end MCP servers as long as your application handles the model-specific wiring. Bill French captured why this matters for real usage: “They’ve officially cracked the sub-second barrier, a breakthrough that fundamentally changes the user experience from merely ‘interactive’ to ‘instantaneous’.”

Can MCP connect to a website domain, FTP server, or specific SharePoint sites?

Usually yes, if you place an MCP server or connector in front of that source. MCP is designed to connect models with external tools and data stores, including APIs, databases, and file systems. That means a website, file-based repository, or service such as SharePoint can fit the pattern as long as the adapter server handles the real connection underneath. The key point is that the source does not need native MCP support.

Is MCP reliable enough for production use?

It can be, if the integration is narrow and well-defined. BQE Software reports 180,000+ questions answered, an 86% AI resolution rate, and 64% of tickets handled by AI, which is far more meaningful than a demo when you are judging production readiness. For MCP-based systems, reliability comes from exposing clear tools, defining explicit input parameters, and returning structured results that your application can handle predictably. Naira Yaqoob said, “CustomGPT.ai has fundamentally changed how we deliver help and support to existing and potential customers. The number of queries handled by our chatbot is steadily increasing over time, thus encouraging self-service and reducing pressure on our support team without compromising quality.”

Priyansh Khodiyar

Priyansh is a Developer Relations Advocate at CustomGPT.ai who writes deeply researched technical content on RAG APIs, AI agent development, and cloud-native tools.