CustomGPT.ai Blog

How We Used AI Response Verification to Find Our Own Blind Spots: A CustomGPT.ai Case Study

Quick Answers

What is AI response verification? It’s a feature that checks if your AI’s answers are accurate and shows you where they came from.

How did CustomGPT.ai use it internally? We turned it on for our own support agents and tracked every inaccurate response.

What problems did you find? Three types: persona issues, system bugs, and missing documentation.

What was the biggest win? We discovered gaps in our docs and created two new help articles our customers needed.

Can I do this for my own AI chatbot? Yes. This post shows you exactly how to audit and improve your AI with the same method.

Your AI chatbot is answering questions right now. Some answers are perfect. Some are close. And some? They’re wrong.

Here’s the scary part. You don’t know which is which.

We didn’t either. Until we started checking.

This is the story of how we used our own Verify Responses feature to catch problems in our support agents. We found bugs. We fixed systems. And we discovered that our documentation had holes we never knew existed.

If you run an AI-powered chatbot, you can do the same thing. Here’s how.

The Problem Every AI Builder Ignores

You build your chatbot. You train it on your docs. You test it a few times. It seems fine.

Then you launch it.

Weeks go by. Customers ask questions. The AI answers. You assume everything works because nobody complains.

But here’s what actually happens.

Most users don’t report bad answers. They just leave. They lose trust. They find another solution. You never hear about the problem.

Meanwhile, your AI keeps giving the same wrong answer. Over and over. To customer after customer.

“We had no idea how many small inaccuracies were slipping through,” says Marko Mitrović, Product Manager at CustomGPT.ai. “We assumed if something was really wrong, we’d hear about it. That assumption was costing us.”

The truth is simple. If you’re not actively checking your AI’s responses, you’re flying blind.

Why We Decided to Eat Our Own Dog Food

We built the Verify Responses feature for our customers. It lets you see exactly how your AI arrived at each answer. It extracts claims, checks them against your source documents and gives you an accuracy score.

But we realized something. We weren’t using it ourselves.

Our own support agents- the chatbots that help CustomGPT.ai users- were running without verification. We had no systematic way to catch errors.

So we flipped the switch. We enabled Verify Responses on our own agents. And we started watching.

“The first week was eye-opening,” says Mitrović. “We thought our agents were performing well. The data told a different story.”

The Three Types of Problems We Found

Once we started verifying responses, patterns emerged fast. Every inaccuracy fell into one of three buckets.

1. Persona Problems

Your AI persona is like its personality and expertise level. It shapes how the AI interprets questions and frames answers.

We found cases where our persona instructions were too vague. The AI would make assumptions instead of sticking to the docs. Small tweaks to the persona fixed these fast.

The fix: We refined our persona settings to be more specific about when to answer directly versus when to say “I don’t have that information.”

2. Core System Issues

Some inaccuracies pointed to deeper problems. The AI was retrieving the wrong chunks of documentation. Or it was combining information in ways that didn’t make sense.

These were harder to fix. But they were also the most valuable to find.

The fix: We worked with our engineering team to improve how the system retrieves and ranks source content. Every customer benefits from this now.

3. Missing Documentation

This was the big one.

We found responses where the AI was trying (and failing!) to fill in the gaps because our documentation was incomplete. Users were asking questions we hadn’t covered.

The AI was doing its best with limited information. But “doing its best” meant guessing. And guessing meant errors.

The Discovery That Changed Everything

Here’s a real example of what we found.

A user asked our support agent: “Can you connect CustomGPT.ai agent to ingest emails with Zapier?”

The AI gave a detailed response. Step-by-step instructions. It looked helpful.

But when we ran it through Verify Responses, something didn’t add up. The accuracy score was lower than expected. Some claims couldn’t be traced back to our documentation.

Why? Because we had never written that documentation.

The AI was piecing together an answer from related content. It was close. But it wasn’t verified. And for a technical integration guide, “close” can mean broken workflows and frustrated users.

“That’s when it clicked for us,” says Mitrović. “The AI wasn’t the problem. Our knowledge base was the problem. We were asking it to answer questions we never taught it.”

How We Turned Errors Into New Content

Finding the gap was step one. Filling it was step two.

We created two new documentation articles:

How to Upload Files to Your Agent Using Zapier– A complete guide for automating file uploads through Zapier integrations.
How to Automatically Sync Gmail Emails to Your Agent’s Knowledge Base – Step-by-step instructions for connecting email content to your AI.

These weren’t random topics. They came directly from real user questions that our AI couldn’t answer accurately.

Now when users ask about Zapier integrations, the AI pulls from verified, complete documentation. The accuracy score jumps. The user gets the right answer.

“Every inaccurate response is a signal,” says Mitrović. “It’s telling you something. Either your AI needs tuning, your system needs fixing, or your content has gaps. You just have to listen.”

How to Run This Same Audit on Your AI

You don’t need to be a CustomGPT.ai employee to do this. If you’re a Premium or Enterprise user, you have access to the same tools.

Here’s the exact process we followed.

Step 1: Enable Verify Responses

Go to your Agentic Actions settings. Turn on Verify Responses. This runs verification automatically on every chat while you’re testing.

Step 2: Let Real Conversations Happen

Don’t just test with questions you expect. Use your chatbot in production or have team members ask real questions. The goal is to see what actual users experience.

Step 3: Check Your Accuracy Scores

In the Customer Intelligence dashboard, filter conversations by accuracy score. Look for responses below your threshold. These are your starting points.

Step 4: Categorize the Problems

For each low-scoring response, ask: Is this a persona issue? A system retrieval issue? Or a content gap?

Step 5: Fix and Retest

Make changes based on what you find. Then run the same questions again. Watch your accuracy scores improve.

Step 6: Use On-Demand Verification for Spot Checks

Even after you switch to production mode, you can run Verify Responses on any conversation. Use this to audit specific interactions that seem off.

What Success Looks Like

After running this process for three weeks, here’s what changed for us:

Our average accuracy score increased across all support agents. We fixed four persona configurations. We identified and resolved two system-level retrieval issues. We published two new documentation articles that directly addressed user needs.

But the biggest change was cultural.

“We stopped assuming our AI was fine,” says Mitrović. “Now we verify. It’s become part of how we operate.”

Your Turn

Every AI chatbot has blind spots. Yours included.

The question is whether you find them before your customers do.

Verify Responses gives you the visibility you need. Not just to catch errors – but to understand why they happen and how to fix them.

We used it to improve our own product. You can use it to improve yours.

Start your free trial of CustomGPT.ai and see what your AI has been getting wrong.

Frequently Asked Questions

What is the Verify Responses feature?

Verify Responses shows you exactly how your AI arrived at each answer. It extracts claims from the response and checks them against your source documents. You get an accuracy score and a clear breakdown of what’s verified and what’s not.

Do I need to be technical to use this?

No. The feature is designed for business users. You click a button. You see results. No coding required.

Will this slow down my chatbot?

No. Your chatbot responds at normal speed. Verification runs as a separate process. Your users won’t notice any difference.

How is the accuracy score calculated?

It’s simple math. Verified claims divided by total claims. If your AI makes 10 factual statements and 8 can be traced back to your docs, your accuracy score is 80%.

Can I run verification on old conversations?

Yes. You can run Verify Responses on-demand for any past conversation. Results appear in seconds.

What if my accuracy score is low?

That’s actually good news. It means you found problems before your customers reported them. Use the insights to fix your persona, improve your docs, or flag system issues.

Is this feature available on all plans?

Verify Responses is available for Premium and Enterprise users. It’s part of the Customer Intelligence suite.

How do I see all my low-scoring conversations?

Go to Customer Intelligence in your dashboard. Use the filters to sort by accuracy score. You can quickly identify which conversations need attention.

What’s the difference between testing mode and on-demand?

In testing mode, verification runs automatically on every chat. In on-demand mode, you manually trigger verification on specific conversations. Use testing mode while building. Use on-demand mode in production for audits.

Can this help with compliance requirements?

Yes. For regulated industries, Verify Responses provides the audit trail your compliance team needs. Every claim is documented. Every source is traceable. This is especially valuable for finance, healthcare, and legal use cases.

AI Response Verification

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.

Automate customer service.

Streamline employee training.

Accelerate research.

Gain customer insights.

Try 100% free. Cancel anytime.

Enterprise

CustomGPT.ai Blog

How We Used AI Response Verification to Find Our Own Blind Spots: A CustomGPT.ai Case Study

The Problem Every AI Builder Ignores

Why We Decided to Eat Our Own Dog Food

The Three Types of Problems We Found

1. Persona Problems

2. Core System Issues

3. Missing Documentation

The Discovery That Changed Everything

How We Turned Errors Into New Content

How to Run This Same Audit on Your AI

What Success Looks Like

Your Turn

Frequently Asked Questions

What is the Verify Responses feature?

Do I need to be technical to use this?

Will this slow down my chatbot?

How is the accuracy score calculated?

Can I run verification on old conversations?

What if my accuracy score is low?

Is this feature available on all plans?

How do I see all my low-scoring conversations?

What’s the difference between testing mode and on-demand?

Can this help with compliance requirements?

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Product

Use cases

Compare

Company

Resources

Dev Resources

Enterprise

CustomGPT.ai Blog

How We Used AI Response Verification to Find Our Own Blind Spots: A CustomGPT.ai Case Study

The Problem Every AI Builder Ignores

Why We Decided to Eat Our Own Dog Food

The Three Types of Problems We Found

1. Persona Problems

2. Core System Issues

3. Missing Documentation

The Discovery That Changed Everything

How We Turned Errors Into New Content

How to Run This Same Audit on Your AI

What Success Looks Like

Your Turn

Frequently Asked Questions

What is the Verify Responses feature?

Do I need to be technical to use this?

Will this slow down my chatbot?

How is the accuracy score calculated?

Can I run verification on old conversations?

What if my accuracy score is low?

Is this feature available on all plans?

How do I see all my low-scoring conversations?

What’s the difference between testing mode and on-demand?

Can this help with compliance requirements?

3x productivity. Cut costs in half.

Launch a custom AI agent in minutes.

Product

Use cases

Compare

Company

Resources

Dev Resources

3x productivity.
Cut costs in half.