CustomGPT.ai Blog

The Model Context Protocol MCP Architecture 2025

May 17, 2025

11 min read

MCP Architecture 2025 graphic shows DEVELOPER banner, central document icon linking tools over binary code.

Welcome to MCP Architecture blog. If you are new to the MCP world and would like to first understand what exactly MCP is, refer to out What is MCP blog.

Why This Guide?

A year after Anthropic open-sourced MCP, the “USB-C port for AI and APIs” has exploded across IDEs, SaaS apps, and cloud platforms.

Microsoft just folded MCP clients into .NET and Azure AI Foundry, and tool vendors from Postgres to Upstash are shipping plug-and-play MCP servers. Refer to Top 124 MCP Servers & Clients You Can Use Right Now (2025 Guide)

Yet many developers still see only the simple client–server sketch. This post layers on the nuts-and-bolts details – full component breakdowns, sequence diagrams, security notes, and real-world deployment patterns, so you can design production-grade MCP systems.

MCP is an open standard that lets any AI host discover and invoke external tools through self-describing “action schemas”. Think of it as gRPC + OpenAPI, but tailored for LLM tool-calling.

Each tool lives behind an MCP server that normalises requests and responses, freeing the model from vendor-specific APIs.

This decoupling is what lets a Claude desktop app, VS Code, or a no-code bot platform all talk to the same GitHub MCP server without extra glue code.

Core Architecture [high level]

Model Context Protocol flowchart maps AI Host/App to External Service via MCP Client and Server using JSON over HTTP/WS. — Model Context Protocol defines a 4-layer request path, with MCP Server brokering tool calls and responses.

Everything else in this post is just turning those four boxes into an ops-ready system.

Component Deep-Dive

Layer	Responsibility	Key Facts & Gotchas
AI Host	UI/UX + local LLM or remote API. Sends tool-calls as model-generated JSON.	Can maintain multiple simultaneous server connections; context window only carries intents & handles, not raw credentials.
MCP Client	Runtime library that: (a) discovers server schema; (b) validates/serialises calls; (c) handles retries & streaming.	Popular SDKs in TypeScript, Python, Rust, and .NET. CLI shim available for shell scripts.
MCP Server	Thin adapter exposing one or more actions. Translates JSON to native API calls and back.	Usually fewer than 200 lines of code when wrapping a REST API. Servers can advertise permissions per action (scopes).
External Service	Anything: GitHub, Postgres, Redis, local FS, HTTP scraping, robotics controller…	MCP keeps credentials here, not in the LLM. Servers often embed secret-manager clients or use mTLS when run on-prem.

Protocol Mechanics

Phase	Transport-agnostic JSON Shape	Example Snippet
Discovery	{“kind”:”mcp.schema”,”version”:”0.6″,”actions”:[…]}	Returns an OpenAPI-style schema with types & examples.
Invocation	{“kind”:”mcp.call”,”id”:”9ab1″,”action”:”list_pull_requests”,”params”:{“author”:”alice”}}	Client attaches call-id for streaming.
Result / Error	{“kind”:”mcp.result”,”id”:”9ab1″,”data”:[…]} or {“kind”:”mcp.error”,”id”:”9ab1″,”code”:”auth”,”msg”:”…”}	Supports chunked mcp.result.part for stdout-like streaming.

Streaming: MCP servers may send mcp.progress events (0-100 %) so the host can update the UI on long-running jobs (e.g., video transcription).

MCP Architecture

Follows a clear, practical client-server pattern designed specifically for building apps that integrate AI.

It consists of four components: the App, MCP Client, MCP Server, and External Service.

1. The App (AI Host)

The App is the user-facing software—like a chatbot, code editor, or a productivity app—powered by AI. It interprets your input, determines necessary actions (like fetching data or triggering tasks), and displays the results.

The app manages user interactions, decides workflow logic, and leverages AI models to improve decision-making (so this is the brain of your app).

2. MCP Client (Universal Connector)

The MCP Client acts as the universal adapter within your app. It’s a software component or library (often available in popular languages like Python, JavaScript, or Rust) that standardizes communication between the app and any external tools or services.

The client handles discovering available actions from MCP Servers, securely sending requests, handling retries, streaming results, and abstracting away complex protocol-specific details (think of it as your smart plug adapter).

3. MCP Server (Protocol Translator)

An MCP Server is a lightweight service that exposes specific capabilities or actions of external tools via the MCP protocol. It accepts standardized MCP requests from clients and translates them into tool-specific operations, like calling APIs, executing SQL queries, or performing file operations.

MCP Servers also manage authentication, error handling, response formatting, and support streaming results. They provide structured schemas to describe available actions clearly (making it easy for apps to know what tools are available and how to use them).

4. External Service/Data Source (Execution Environment)

This is where the core functionality happens. It includes resources like databases (Postgres, MongoDB), APIs (GitHub, Slack), file systems, cloud storage (AWS S3, Google Cloud Storage), or even IoT devices.

These services handle actual tasks requested through MCP Servers and can operate either locally on your machine or remotely in the cloud (the engine doing all the heavy lifting).

Refer to Top 124 MCP Servers & Clients You Can Use Right Now (2025 Guide) for links to the Database MCP servers.

Communication Workflow (Technical Details)

All communication between the AI’s client and the MCP server happens over the standardized MCP protocol (often using transports like HTTP, WebSockets, or even just stdin/stdout for local connections).

The protocol defines a common message format for: tool discovery, invoking an action, and returning results/errors. Here’s a typical flow:

Discovery:

When an MCP client connects to a server, it can query what capabilities or “tools” that server offers. The MCP server responds with a machine-readable list of functions (sometimes called an action schema or manifest) describing each available action, its inputs, and output format.

For instance, a GitHub MCP server might advertise actions like list_pull_requests(author) or create_issue(title, body), whereas a Calendar server might advertise find_available_slot(date) or add_event(details) – along with what parameters each expects.

This built-in self-discovery means the AI agent can learn how to use a new tool at runtime, without pre-programming each possible command.

Invocation:

When the AI (via the MCP client) wants to use a tool, it sends a request to the appropriate MCP server using a standardized JSON structure (often analogous to a function call: specifying the action name and a payload of parameters).

For example, the AI might send a request like

{“action”: “list_pull_requests”, “params”: {“author”: “alice”}}

to the GitHub MCP server. The MCP server receives this, translates it into the real GitHub API call (GET /repos/…/pulls?author=alice or so), and then gathers the response.

Result:

The MCP server sends back a structured result (e.g. the pull request data in a consistent JSON format) or an error message if something went wrong. The MCP client passes this result up to the AI model, which can then incorporate the information into its response or decide on the next step.

Because all MCP servers format responses in a consistent way that the AI expects, the AI doesn’t have to deal with dozens of data formats or error codes from different APIs – everything is normalized.

Chaining:

The AI can chain multiple tool calls in a single session. For instance, it could query a database via one MCP server, then send an email via another, then log the result to a file – all as part of one multi-step plan.

The MCP architecture supports multiple simultaneous server connections, so an AI can maintain context across various tools seamlessly. Each MCP server is independent, but since the AI client orchestrates calls to all of them, it’s like the AI has a suite of tools at its fingertips.

One of the most powerful aspects of MCP is that an AI agent can connect to a brand new tool it’s never seen before and still understand how to use it, thanks to that shared protocol and discovery mechanism.

As soon as you spin up a new MCP server and register it with the AI client, the AI can query its capabilities and start invoking them – without any code changes in the AI itself. This is a radical departure from traditional integrations where a developer had to hard-code how the AI interacts with each new service.

Deployment Topologies

Pattern	When to Use	Diagram Snippet
Local-only	Personal automation, embedded IDE plugins.	Host + Client + Server all on the same laptop; transports use stdio.
Edge Gateway	SaaS wanting tight network egress control.	Expose a single “gateway” MCP server that forwards to internal micro-servers; apply ACLs centrally.
Mesh	Enterprise with many data planes & models.	Multiple hosts (chatbots, voice bots) share a fleet of servers registered in a service registry (Consul, etcd). Load balancing via Envoy sidecars.

Real-World MCP in 2025

Cursor IDE: Queries Postgres via an MCP server, executes code snippets in a sandbox, and pushes commits to GitHub—all without leaving the editor.MCP
Azure AI Foundry: Generates agent chains that automatically pull CRM data (Dynamics MCP server) and send Teams messages (Graph MCP server). Microsoft for Developers, TECHCOMMUNITY.MICROSOFT.COM
VS Code .NET Extension: Uses a built-in MCP client so Copilot can call dotnet.compile and unit_test.run actions exposed by the local SDK. Microsoft Learn
Replit Ghostwriter: Deploys a BrowserTools MCP server for live DOM inspection while coding web apps. The Verge

Security, AuthZ & Governance

Concern	Mitigation
Over-privileged actions	Scope each action with fine-grained OAuth tokens; servers should expose a readonly inspection mode for LLM analysis vs mutation mode for state changes.
Prompt Injection	Client libraries can enforce allow-lists: reject any model-generated call whose action is not in policy.
Audit & Replay	Because every call is JSON, log the envelope and payload; sha256-hash payloads containing PII.
Secret Management	Use workload-identity federation (e.g., Azure AD Workload ID) so secrets never live in env vars.

What’s Next?

v0.7 Spec (ETA Q3-2025) adds annotated JSON Schema for nested objects and a bi-directional “push channel” for server-initiated events.
Hardware Brokers will expose IoT devices (robot arms, sensors) to LLMs—early PoCs already demo end-to-end pick-and-place via MCP.
Formal Verification efforts aim to statically verify that an action’s side-effects match its declared safety metadata.

Final Thoughts

MCP’s genius is its minimalism: discovery, invoke, result.

Want to try a hosted MCP server? Check out CustomGPT’s Hosted MCP Solution for free.

Frequently Asked Questions

How does MCP tool discovery work between an AI host and an MCP server?

MCP is described as an open standard where an AI host can discover and invoke external tools through self-describing action schemas. In practice, that means the host reads each tool’s schema from the MCP server and uses that structure to call tools in a consistent way, instead of handling vendor-specific API formats one by one.

Can MCP architecture help reduce RAG failure modes like missed documents, dropped context, and missed extraction?

MCP can help indirectly, but it is not a standalone fix for retrieval quality. Its main value is standardizing how hosts connect to tools, so retrieval components can be integrated more consistently across different AI hosts. You still need strong retrieval and context design to address missed documents or extraction issues.

What should an enterprise compliance review ask for in an MCP architecture?

For enterprise use, start by asking how security is handled across the MCP components and deployment pattern. A practical review should confirm where tool calls are executed, how data moves between host and MCP server, and how those flows are documented for production operations.

When should you build a custom MCP server instead of using a prebuilt integration?

Use a prebuilt MCP server when your use case matches existing tool capabilities and you want faster implementation. Build a custom MCP server when you need tailored request/response normalization or domain-specific behavior that off-the-shelf servers do not provide. MCP’s decoupled design supports both approaches.

Is MCP more secure than putting API calls directly into prompts?

MCP can be easier to secure in production because tool execution is mediated through MCP servers rather than ad hoc, vendor-specific prompt wiring. That server-layer architecture gives teams a clearer place to apply security controls and operational policies. Final security outcomes still depend on implementation quality.

What causes slow MCP workflows, and how can you reduce latency?

MCP latency depends on the full end-to-end path: host, MCP server, and external tool systems. To reduce delays, focus on architecture-level optimization—simplify tool paths, reduce unnecessary round trips between components, and test deployment patterns under realistic production traffic.

Related Resource:

Hosted MCP server for Trae Use CustomGPT.ai’s hosted MCP endpoint to connect Trae to project-specific files and knowledge.

Priyansh Khodiyar

Priyansh is a Developer Relations Advocate at CustomGPT.ai who writes deeply researched technical content on RAG APIs, AI agent development, and cloud-native tools.