CustomGPT.ai Blog

Developer’s Toolkit: Custom GPT RAG API Test Harness for Measuring Answer Accuracy

Developer’s Toolkit Custom GPT RAG API Test Harness for Measuring Answer Accuracy

CustomGPT.ai provides various command line tools designed to serve different purposes and functionalities. One such tool is an automated test script from the CustomGPT.ai cookbook, which allows users to evaluate chatbot performance directly through the command line interface. 

This test script measures chatbot performance by calculating the accuracy and relevance of the responses, providing you with a framework for measuring answer accuracy.

In this article, we will explore the functionalities of this command line tool, demonstrating how it can enhance your interaction with CustomGPT.ai and improve the overall efficiency of managing chatbot conversations.

Let’s see how this automated test script measures chatbot performance.

How Command Line Tool Measures Chatbot Performance

The “Harness for Asking Questions” command line tool from the CustomGPT.ai cookbook is designed to measure chatbot performance by evaluating the accuracy and relevance of the responses generated by the chatbot. This tool automates the testing process, making it easier for developers and businesses to ensure that their chatbots provide meaningful and contextually appropriate answers.

Measuring Chatbot Performance with the Automated Test Script

The automated test script from the CustomGPT.ai cookbook simplifies the process of evaluating chatbot performance. Here’s how it works:

Sending Queries

  • The script allows users to send a series of questions or prompts to the chatbot. These queries can be customized based on the specific use case or domain of the chatbot.

Receiving Real-time Responses

  • As the chatbot generates responses to the queries, the script captures these responses in real time. This ensures that the evaluation is based on the chatbot’s actual performance during live interactions.

Calculating Accuracy and Relevance

  • The core functionality of the script involves calculating the accuracy and relevance of the chatbot’s responses. It does this by comparing the words and phrases in the user’s query with those in the chatbot’s response.
  • The script calculates a relevancy score, which indicates how many words from the user’s query are present in the chatbot’s response. A higher relevancy score suggests that the chatbot’s response is more accurate and relevant to the query.

Assessing Performance Metrics

  • Based on the relevancy scores and the context of the responses, the script provides an overall assessment of the chatbot’s performance. This includes identifying areas where the chatbot excels and areas where improvements are needed.

Providing Insights

  • The script offers detailed insights into the chatbot’s performance, helping developers and businesses understand how well the chatbot is meeting user expectations. These insights can be used to fine-tune the chatbot’s training for better performance.

In summary, the “Harness for Asking Questions” command line tool from CustomGPT.ai is an essential resource for evaluating chatbot performance. By automating the testing process and providing detailed performance metrics, this tool ensures that chatbots deliver accurate, relevant, and contextually appropriate responses, enhancing the overall user experience.

CustomGPT.ai Command Line Tool “Harness.py”: Understanding the Script Functionality Programmatically

Let’s break down the functionality of the script and how it operates under the command line interface.

Import the Necessary Modules

The script starts by importing necessary modules: requests for sending HTTP requests, json for handling JSON data, SSEClient for managing Server-Sent Events (SSE) to receive real-time responses, and re for regular expression operations in the similarity calculation.

AD 4nXcOZDUPSP9FuNpaAgykoHFTzX1TUneyqZcaQ3sllCN pB1PYImm0AWG6OwBL tU syVk37hZKs9DL7T TZdYXat6LI6m0UsV5C4O9DRz US2tYWVa MH0 r wLcL1QlqkLQ6wnCbVwQgvI gvSrN7qoZ4uW?key=r0CBpsAG7cmEfwfNdseY Q

The calculate_similarity function

The calculate_similarity function compares a user’s question and the chatbot’s answer by converting both to lowercase, extracting words using regular expressions, and finding common words between them. It returns the number of common words and the set of common words to assess response relevance.

AD 4nXd33hK2ACMobT3YbOApkQLJ jiVVV6iF9wfn sAkDelMICA7VeVNI0oZ72hsZMEdAFsLQtJ9r9TWEceYfZkBY906GbtURD4SEHk1pSxqPb6WjCep eHDIp8kY8HuClcetOTUOnDZDo1Tyu9iL4WslS5luU?key=r0CBpsAG7cmEfwfNdseY Q

Main Function: Setting up RAG API

This part of the script defines the main function, which sets up the RAG API endpoint and prompts the user to enter their RAG API token and project ID for authentication. It then configures the headers for subsequent RAG API requests, including content type, authorization token, and event stream acceptance.

AD 4nXf1oohlFRyZAvMy8d42C6qGHuUys02tIQPvCSOZ5Uof1y VFdacb7twc9lDgImedvQd9Ldp3jtv29h4qKA H09w9RHE88 dPXRNaRM4Rfssw0Am8Jo7up98sG6mgskAGk kr1gOIhhdgnGa5h4yf0G01us?key=r0CBpsAG7cmEfwfNdseY Q

Fetch Project Information

Now script fetches project information by sending a GET request to the specified RAG API endpoint. It constructs the URL using the provided RAG API endpoint and project ID. The request includes authorization headers with the user’s RAG API token for authentication. After receiving the response, it prints the project information response, including any data returned from the RAG API call.

AD 4nXc 5Bt6ab1MWX7uWNiGxBFXz2rKRKre6R9MmdZ NLrDpget8b9rf5AbvQZlfwmsitHTNXVi3GJ7seJLaR

Create Conversation within the Project

The script creates a new conversation by sending a POST request to the RAG API endpoint dedicated to conversations within the specified project. 

AD 4nXdj0vV4K3Lok6YZB1q8fhiKGV50YX0ZOjHLPRlQd5pIGrxs1KX6Jj2B7
  • It prompts the user to enter a name for the conversation, which is then converted into a JSON payload. 
  • The request is made to the designated URL using the provided RAG API endpoint and project ID, along with the authorization headers. 
  • After receiving the response, it prints the conversation creation response, extracts the session ID from the returned data, and stores it for further interaction with the conversation.

Continuous Conversation Loop: Asking a list of questions

In this part, the script initiates a continuous conversation loop where the user can input a list of questions separated by commas. If the user enters “exit,” the conversation loop ends. Otherwise, it splits the input into individual questions and iterates over them. 

AD 4nXctBX5wVmfLNq08bf87rWTtU0qVjjuRxiMmrHi eGNUsSfv1jrnCgNVm jW4v2J 287USQbpc5Y 2uU07OxuyAjlNMHeWEn1rp2KiR8LElTPf vYvgAwvmK8haK2imsN6tqdoThsk1uo3t3o3YR6Fp8lLg ?key=r0CBpsAG7cmEfwfNdseY Q
  • For each question, it constructs a JSON payload containing the question prompt and a flag indicating it’s a stream request. 
  • This payload is sent as a POST request to the RAG API endpoint dedicated to sending messages within the specified project and session. 
  • The script then establishes a server-sent events (SSE) client to receive the response from the server. 
  • As the response streams in, it prints the received message (the chatbot’s response) and accumulates it into an “answer” string. The loop continues until the server signals the end of the conversation.

Call to the calculate_similarity function: Analyzing Chatbot Response Similarity

This portion of the script calculates the similarity score between the user’s question and the chatbot’s response. 

  • It calls the calculate_similarity function, passing the question and the answer as arguments. 
  • The calculate_similarity function computes the number of common words between the question and the answer and returns both the similarity score and the set of common words. 
  • The script then prints the chatbot’s response, the similarity score (which represents the count of common words), and the set of common words. 

Finally, the main() function is invoked if the script is run directly.

Testing script in the command line interface

To begin testing your chatbot’s performance using the command line interface (CLI) script, follow these steps:

Open the Command line interface and download the script from the CustomGPT.ai cookbook by navigating to the script’s page and clicking on the download button. Save the script to a directory of your choice on your local machine.

AD 4nXeiBdzLZ7f1 ljJ9vMMUW1I7CRJEpJn KW0yZLkWj8haMUM5kEUJn5VDm957zeJy2M3TcuvYnNmvfXFcEtSQwcHY4uZtbheSdGo9Bcdi7uD4XjNXAQg3IdCb2efs795dWQ0vDOj36 rl0UkFP1WR78r5eoQ?key=r0CBpsAG7cmEfwfNdseY Q

Once the script is downloaded, write the path to the directory where the script is saved using the ‘cd’ command in the command line.

AD 4nXegsxtlZIwihZlqpjp99h0LB65SGbEiCzfyF9hANHrCrrGttSAeQoF

Follow the prompts in the command line interface to input your RAG API token and project ID. You can your RAG API token and Project ID from the CustomGPT.ai account.

AD 4nXeyWiv6Wz74wivppSAw2lHJkuDZXUE9THLQbR22vwESiY6ZQ9CWp8ibjVU0Vrq8I8vazCB7yUnv3iCIK4E4OHlosoVDFny4TSh0jFWJqJMIOrMpbMp4TcT04BNbMdijCUmgaDLWOhIbZzQq5p 9wV4YDw0?key=r0CBpsAG7cmEfwfNdseY Q

Now Enter the name of the conversation to stream real-time responses.

AD 4nXd6vU07QovqeBiI6MDIYZQo3YSeIZx

Now write the list of questions to engage in a conversation in a chatbot. Press the Enter button.

AD 4nXcSS g KxyPo9YPoa967poYFOMnXEkojDgCv b5OskNMcBKJ 54cEzGKIL5ei2yLWmJwD5HVwWhzrIEKdo4IKvXOLpADX2asHY

From the list of questions being asked, one by one every question and its response will be displayed in the command line. The response to the first question is displayed as shown below.

AD 4nXcWJoqOijNK3PIErsOjizjlK0KjCEc8Wero2p4hl5JXzunIIC2WHeUUuVK nmVzKuJgPaTfNQJDlYtBaUE6QTBl9DQimN9eoN zIT1F94HyjjrS pwD 99QJnUJ TwfLFShTWgDVw6 qX6Uzy rBnYgBVNw?key=r0CBpsAG7cmEfwfNdseY Q

Every response will be showcased in both the true and false streams, along with the associated similarity score.

Now let’s see the response to the second question from the list being asked and check its similarity score.

AD 4nXcXoJO3pnWeB5Sj nU eLntYofkEuFiDoft vLD1hr3hd q4UO742kFubZWhDKem5ivk1yQ8jy8ob7 uooahpiunDrbHhJuwLgxBZY0JAsyh5J 8Y1E7sf TJHSBHwL QjGbJkCQnCFEzJdD90ML4IkwhKC?key=r0CBpsAG7cmEfwfNdseY Q

Now we will check the response to the third and last question from the list.

AD 4nXc4ylp9

To exit the ongoing conversation, simply type “exit” into the command line interface and press Enter. This action will terminate the conversation with the chatbot, allowing you to conclude the testing session and analyze the results obtained.

AD 4nXezG1pu3Axnba3pWqpMJr8cQTWiBlavJ8vptbz87JIwe3CudfYlYdUoFNRTU0vsl1Ub6uVzuAFe Zyr0wVtbMY lLpbc RrQerJEfxWA53nRKnhy3dH5ra4trJ e8Dj kSuPHw62S5bzpurtevvtR9c1izk?key=r0CBpsAG7cmEfwfNdseY Q

Conclusion

In conclusion, the command line interface (CLI) script provided by CustomGPT.ai offers a convenient and efficient way to test the performance of your chatbot in real time. With our intuitive CLI testing script and powerful AI capabilities, you can optimize your chatbot’s responses and provide exceptional user experiences. Don’t miss out on this opportunity to revolutionize your customer interactions and streamline your business processes. Sign up now and take your chatbot to the next level with CustomGPT.ai!

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.
Automate customer service.
Streamline employee training.
Accelerate research.
Gain customer insights.

Try 100% free. Cancel anytime.