CustomGPT.ai Once Again Outperforms OpenAI for Accuracy

CustomGPT.ai

CustomGPT.ai has outperformed OpenAI’s Assistant API V2 with greater accuracy, fewer hallucinations, and faster average response times in a new, more thorough evaluation.

The latest study compares CustomGPT.ai and OpenAI’s Assistant API V2 performance across 945 questions from diverse datasets. 

It has been third-party validated by benchmark experts Tonica.ai, who performed a prior assessment a few months ago in which CustomGPT.ai outperformed OpenAI, Google, Amazon, and Cohere. 

Limiting or eliminating AI hallucinations, where AI generates information not grounded in reality or provided context, must be a priority for organizations adopting AI technologies. It’s also the premise by which CustomGPT.ai was founded. We’re thrilled to see the results of this new comparison, which firmly underlines CustomGPT.ai’s potential for serving companies requiring high-precision AI solutions and the quality of the experience received by our 6,000+ existing customers. 

New Anti-Hallucination Benchmark Cements CustomGPT.ai’s Potential 

The new study, performed by Atman Academy and validated by Tonic.ai, set a new standard for excellence by using nearly 1,000 questions from nine very different datasets. In addition to an expanded question set and more diverse text, the study used a much stricter evaluation metric that required 100% accuracy for a passing score. 

The comparison also pitches CustomGPT.ai against OpenAI’s newest version of its business chatbot offering—Assistant API V2, which has advanced file search capabilities.

CustomGPT.ai’s performance was remarkable, demonstrating:

13% Higher Accuracy Rate: Compared to Assistant API V2. This means fewer inaccurate responses and more reliable information delivered by CustomGPT.ai chatbots and AI assistants. 

10% Lower Hallucination Rate: CustomGPT.ai’s advanced algorithms more effectively filter out irrelevant information, reducing the likelihood of hallucinations, where AI delivers false or ungrounded responses. 

34% Faster Average Response Time: These more accurate responses are delivered much faster, demonstrating improved efficiency without sacrificing the quality of AI’s answers. 

Adoption of an Anti-Hallucination First Focus is Vital

The deployment of AI solutions comes with a great responsibility to ensure the quality and accuracy of the responses delivered. CustomGPT.ai CEO Alden Do Rosario recommends:

“To reduce risk, entities should adequately vet foundational AI technology and use solutions that are proven.”

Do Rosario believes this latest study’s findings will especially resonate in industries where accuracy is paramount, such as the legal sector, finance, healthcare, and education. He says:

“In today’s AI race, companies must adopt an ‘anti-hallucination first’ focus.”

AI skeptics rightly challenge AI’s reliability, precision, and performance. AI hallucinations can lead to misinformed decision-making, compliance issues, safety risks, the erosion of trust in AI, severe reputational damage and even legal risks for organizations unable to mitigate the risks of AI. 

“Gone are the days of organizations needing to settle for chatbots that generate inaccurate responses, especially from short-sighted, underperforming, or overpriced AI vendors,” adds Do Rosario.

“The future is wide open for gen AI to responsibly deliver comprehensive and contextually accurate information in order to truly help organizations advance decision-making capabilities, improve operational efficiency, and increase revenues.”

Robust Evaluation of Retrieval-Augmented Generation (RAG) in Mitigating AI Errors

The study assesses the performance of Retrieval-Augmented Generation (RAG) technology, which is used by both CustomGPT.ai and OpenAI. RAG drastically enhances the capabilities of generative AI and large language models (LLMs). LLMs are a foundation for natural language processing and enable text generation and question-answering, but they rely on large data sets, often outdated data, and can deliver inaccurate or inconsistent responses. 

RAG leverages LLMs but also external knowledge sources. It retrieves answers from information provided explicitly by a company or organization before using the LLM to enrich the response, producing accurate, contextually relevant answers grounded in real-world knowledge. 

Study Methodology

The objective of this latest study project was to benchmark CustomGPT.ai against OpenAI’s Assistants, specifically assessing performance in the reduction of hallucinations. The goal was to position CustomGPT.ai as a superior AI solution for industries where precision is critical, such as legal, medical, and financial services.

The assessment used a systematic methodology to ensure accuracy and reliable results. GPT-4o was the assessor. In contrast to the previous study, which used just 55 questions, the new study used 945 questions on various topics. Tests were conducted in a controlled environment using the same hardware and software configurations. 

The full technical analysis of the CustomGPT.ai anti-hallucination answer consistency benchmark explains the complete methodology and approach. 

The “Answer Consistency Binary” Metric 

The “Answer Consistency Binary” metric was used to evaluate CustomGPT.ai and OpenAI Assistant performance. In essence, the metric leaves no room for ambiguity in the responses. If the response delivered to a question is entirely consistent with the provided context, it passes (scored as 1). Any inconsistency in the response is an immediate fail (scored as 0). 

Tonic.ai, developers of Tonic Validate and industry leaders in RAG metrics and independent validation methodologies, validated the study approach, which closely follows Tonic.ai’s RAG evaluation approach to ensure a robust and replicable process. 

Tonic.ai defines the Answer Consistency Binary Metric used in the evaluation:

The metric is particularly important in order to deliver unambiguous measurement, eliminating any grey areas in evaluations. It’s a highly effective approach for identifying hallucinations and it gives a fair comparison and a level playing field for comparing CustomGPT.ai against OpenAI Assistant API V2. 

Analyzing Responses

The following is an example question from the study and the responses from CustomGPT.ai and OpenAI from the benchmark testing:

Question: “What is the significance of the Turing Test in the field of artificial intelligence?”

CustomGPT.ai Answer (Score: 1): “The Turing Test, proposed by Alan Turing in 1950, is a significant benchmark in artificial intelligence to determine whether a machine can exhibit intelligent behavior indistinguishable from that of a human. It has influenced the development and evaluation of AI systems, highlighting the importance of natural language processing and human-computer interaction.”

OpenAI Answer (Score: 0): “The Turing Test is used to see if a machine can act like a human. It’s important in AI.”

And the study’s analysis of this response:

Analysis: CustomGPT.ai offers a detailed explanation of the Turing Test’s significance, including its historical context and impact on AI development. OpenAI’s answer, while not incorrect, lacks depth and fails to capture the test’s full importance in the field.

Study Findings and Results

The quantitative findings from all 945 questions were as follows:

Inconsistent Responses (Binary Score 0):

OpenAI: 513 instances

CustomGPT.ai: 457 instances

Study interpretation: OpenAI had a higher number of inconsistent responses, indicating more frequent hallucinations.

Consistent Responses (Binary Score 1):

OpenAI: 432 instances

CustomGPT.ai: 488 instances

Study interpretation: CustomGPT.ai had a higher number of consistent responses, showcasing its superior ability to maintain context accuracy.

The study drew the following insights:

  • Accuracy and Consistency: CustomGPT.ai achieved a 13% higher Accuracy Rate compared to OpenAI, providing consistent answers with no extraneous information the majority of the time.
  • Response Time: CustomGPT.ai demonstrated a 34% faster average response time, indicating improved efficiency without sacrificing accuracy.
  • Hallucination Reduction: The 10% lower hallucination rate suggests that CustomGPT.ai’s advanced algorithms more effectively filter out irrelevant information, reducing the likelihood of generating unfounded content.

The study also drew a number of technical insights, including that the lower hallucination rate implies CustomGPT.ai may have “enhanced capabilities in distinguishing between relevant and irrelevant information, possibly through advanced semantic understanding or improved knowledge base integration.” Also, the performance gap maintained in the large sample size of 945 questions “suggests that CustomGPT.ai’s improvements are likely to hold at scale.”

High-Precision and Contextual Integrity

The results of this comprehensive benchmark study position CustomGPT.ai as an AI solution for chatbots, AI agents, and AI assistants in industries where precision is vital. 

“The consistent performance across various questions and contexts demonstrates its robustness and adaptability, which is crucial for deploying AI in dynamic environments where context can vary significantly.”

The Atman Academy’s research team is available for a detailed breakdown of its methodology. 

CustomGPT.ai provides a business-grade, privacy-first, zero-code generative AI platform. SaaS technology makes it quick, easy, and affordable for anyone—regardless of technical expertise—to provide their own content and data to build custom AI chatbots and other GPT agents and confidently deploy these solutions. We leverage advanced large language models (including OpenAI’s GPT-4) to offer industry-leading accuracy and anti-hallucination protection.

Build a Custom GPT for your business, in minutes.

Deliver exceptional customer experiences and maximize employee efficiency with custom AI agents.

Trusted by thousands of organizations worldwide

Related posts

Leave a reply

Your email address will not be published. Required fields are marked *

*

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.
Automate customer service.
Streamline employee training.
Accelerate research.
Gain customer insights.

Try 100% free. Cancel anytime.