List All Pages Belonging to an Agent Using CustomGPT.ai RAG API’s Python SDK – A How-To Guide

Author Image

Written by: Priyansh Khodiyar

List All Pages Belonging to an Agent Using CustomGPT.ai RAG API’s Python SDK – A How-To Guide

Learn how to fetch all pages indexed in your CustomGPT project via the CustomGPT RAG APIs Python SDK implementation. In this guide, we show you how to retrieve a complete list of pages (whether they came from uploaded files or crawled URLs) that belong to a specific CustomGPT project. List all pages to verify that your bot has ingested all the content you provided and to build features like a content index or source list for your chatbot.

As a prerequisite, check out Getting Started with CustomGPT.ai for New Developers to ensure you have your API key and environment ready to go.

Notebook Link: CustomGPT Cookbook – SDK_List_all_pages_belonging_to_a_project.ipynb

Introduction

In this tutorial, we’ll walk through listing all pages of a CustomGPT project using the Python SDK. Every CustomGPT project (chatbot) is backed by one or more knowledge sources – these can be documents you uploaded, web pages fetched via a sitemap, or other data sources. Each source gets broken down into “pages” in the CustomGPT system. For example, if you uploaded 3 PDF files and added 2 webpages, you’d have 5 pages in your project.

Being able to get a list of all these pages programmatically is useful. You might want to:

  • Audit which pages are included in your bot’s knowledge base.
  • Display the list of sources in a UI (perhaps to let users know what the bot knows or to show citations).
  • Check the status of each page (indexed successfully or not).
  • Remove or update specific pages via other API calls.

This guide will show how to use CustomGPT.Page.list() to retrieve all page entries for a given project.

Prerequisites

  • CustomGPT API Key – Your API key for authentication, with access to the project whose pages you want to list.
  • CustomGPT Python SDK – Ensure customgpt-client is installed. We’ll use its Page listing function.
  • Project ID – The identifier of the project (agent) whose pages you want. If you don’t have one yet, we’ll create a sample project.
  • Python environment – Google Colab, Jupyter, or any Python environment to run the code.

Step-by-Step Guide

Let’s retrieve list of all pages belonging to a project step by step:

  1. Install and initialize the SDK. Start by installing the CustomGPT client and setting your API key:
!pip install customgpt-client

from customgpt_client import CustomGPT

CustomGPT.api_key = "YOUR_API_TOKEN"

Get API keys

To get your API key, there are two ways:

Method 1 – Via Agent

  1. Agent > All Agents.
  2. Select your agent and go to deploy, click on the API key section and create an API. 

Method 2 – Via Profile section.

  1. Go to profile (top right corner of your screen)
  2. Click on My Profile
  3. You will see the screen something like this (below screenshot). Here you can click on “Create API key”, give it a name and copy the key.

Please save this secret key somewhere safe and accessible. For security reasons, You won’t be able to view it again through your CustomGPT.ai account. If you lose this secret key, you’ll need to generate a new one.

  1. Replace the API token with your actual key. This gives us access to the CustomGPT SDK functions.
  2. Create or identify a project. You need a project with some pages to list. If you already have a project, note its project_id and skip to the next step. Otherwise, let’s create an example project and add content to it so we have pages to list. For demonstration, we will create a project by uploading a small text file:
from google.colab import files

from customgpt_client.types import File

# Create a new project (initially empty)

project_name = "Example ChatBot using File"

uploaded_file = files.upload()  # This will prompt to upload a file in Colab

file_content = next(iter(uploaded_file.values()))

create_project = CustomGPT.Project.create(project_name=project_name, 

file=File(payload=file_content, file_name='example.txt'))

print(create_project)
  1. In Colab, files.upload() lets us select a local file to upload (for example, a text or PDF file). We then call CustomGPT.Project.create with the file parameter to create a project and immediately add that file as a content source. The print(create_project) output will show details including the new project_id. (If not using Colab, you can adapt this by reading a file from disk and passing its bytes similarly.)
  2. Obtain the Project ID. Extract the project ID for the project we’ll list pages from:
project_id = create_project.parsed.data.id
  1. If using an existing project, just assign its ID directly (e.g., project_id = “proj_123ABC”).
  2. Fetch all pages for the project. Now use the SDK to list pages:
pages_response = CustomGPT.Page.list(project_id=project_id)

print(pages_response)
  1. This calls the API endpoint to list pages. The print will output the raw response, which includes a list of page objects and possibly some metadata. If your project has multiple pages and the API paginates results, the first call will return the first batch (usually the first 10 pages by default) along with info on total pages.
  2. Handle pagination (if needed). The CustomGPT API might paginate the list of pages if there are many. The response typically includes metadata like total count of pages and possibly a pagination structure. For a small number of pages, you’ll get them all in one go. To be safe, we can iterate if there’s pagination. Here’s a general approach:
# Parse the first response

data = pages_response.parsed.data.data  # list of pages in first page of results

total_pages = pages_response.parsed.data.total  # total number of pages (or total results)

# If there are more pages of results, fetch them

for page_num in range(2, total_pages + 1):

    next_response = CustomGPT.Page.list(project_id=project_id, page=page_num)

    data.extend(next_response.parsed.data.data)
  1. In this snippet, we assume the API returns total as the total number of pages of results (common approach). We loop from page 2 onward up to that total to fetch each page of results and extend our data list. After this loop, data will contain page objects for all pages in the project.

    Note: If the API instead returns total count of items and a page size, adjust accordingly. For instance, if total_items and per_page are given, you’d compute total pages = ceil(total_items/per_page).
  2. Inspect the list of pages. Now that we have all pages in data, let’s see what each page object contains:
for page in data:

    print(page.id, ":", page.url or page.file_name, "-", page.status)
  1. This will print an identifier for each page and some key info:
    • If the page came from a web source (sitemap/URL), it might have a url attribute.
    • If it’s from an uploaded file, it might have a file_name.
    • Each page likely has a status (e.g., “indexed”, “pending”, or “failed”) indicating whether it’s successfully processed.
    • There may also be other attributes like a hash, timestamps, etc.
  2. For example, you might see output like:
page_1a2b3c : https://example.com/docs/intro.html - indexed  

page_4d5e6f : CompanyProfile.pdf - indexed
  1. This tells us the project has an indexed page from a URL and one from a PDF file, with their identifiers.
  2. Use the pages list as needed. Now you have a Python list (data) of all page objects. You can utilize this information however you need:
    • Display the list of sources on a website or app to inform users.
    • Check if a particular URL or file has been indexed (by searching the list).
    • Feed the page IDs into other API calls, like reindexing a specific page or deleting a page (for content management).

And that’s it! You have successfully retrieved all pages of your CustomGPT project using the SDK. This confirms what content your chatbot is trained on and can be the basis for further content management tasks.

FAQs

What information does each “page” contain in the context of CustomGPT?

Each page object typically includes:

A unique page ID (identifier).

The source of the page: either a URL (url field) if the content came from crawling a webpage, or a file name (file_name field) if it came from an uploaded document.

Status indicating indexing state (e.g., “indexed” means it’s processed and ready, “processing” if it’s still being ingested, or “failed” if there was an issue).

Timestamps (when it was created/indexed).

Other metadata like file size or word count might also be present. This information is useful for auditing and ensuring all intended content is present.

Does this API call also return the content of each page?

No, the Page.list call returns metadata about the pages, not their full content. It’s like a directory of sources. It will list page IDs, names/URLs, statuses, etc., but it won’t return the entire text of each page. To get actual content, you would either use the retrieve message endpoints with citations or other specialized endpoints (like perhaps a “get page content” if provided by the API). Usually, the actual content is stored in the vector index and isn’t directly dumped via API for efficiency reasons.

How many pages can a project have, and will they all be returned?

A project can have quite a lot of pages – it depends on your plan and use case (for example, if you uploaded a large number of documents or a very large website). The API will return all pages, but it may paginate the results. That means you might have to use the page parameter as we demonstrated to get beyond the first batch. The example we gave assumes you fetch subsequent pages of results if total_pages (or a similar field) indicates more. Always check the API’s response metadata: it will tell you if there are more results to fetch.

I noticed a page has status “failed”. What should I do?

A “failed” status means that particular page (e.g., a specific URL or file) could not be indexed. This could be due to an unsupported format, a crawling error (if the URL was unreachable or blocked), or size limits. If you see a failed page, you may consider re-adding it or checking if the source is accessible. 

Related Posts

Build a Custom GPT for your business, in minutes.

Deliver exceptional customer experiences and maximize employee efficiency with custom AI agents.

Trusted by thousands of organizations worldwide

Related posts

Leave a reply

Your email address will not be published. Required fields are marked *

*

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.
Automate customer service.
Streamline employee training.
Accelerate research.
Gain customer insights.

Try 100% free. Cancel anytime.