Add a sitemap to an agent and ingest large websites by pointing your CustomGPT project at a sitemap URL. This guide covers:
- Installing & authenticating the CustomGPTs RAG APIs Python SDK.
- Pointing your project at a sitemap XML
- Monitoring bulk crawl & index progress
- Verifying in the dashboard
- Troubleshooting tips
New to CustomGPT? Sign up here and explore our API documentation.
Also check out the GitHub Cookbook example notebook.
Prerequisites
Before you begin, make sure you have:
- CustomGPT.ai Account & API Key
Create or sign in at CustomGPT and generate an API token under Settings → API Tokens. - Python 3 Environment
Use Google Colab, Jupyter Notebook, or your local Python 3 setup. - customgpt-client SDK installed
pip install customgpt-client
- Existing Project ID
You can add a sitemap to an agent (project) or an existing one—just note its project_id. - Valid Sitemap URL
A link to a sitemap.xml, for example:
https://adorosario.github.io/small-sitemap.xml
With everything ready, let’s install and authenticate the SDK.
1. Install & Authenticate the SDK
First, install the Python SDK and configure it with your API key so all calls are properly authenticated.
# 1. Install the CustomGPT SDK from PyPI
!pip install customgpt-client
# 2. Import the SDK’s main interface
from customgpt_client import CustomGPT
# 3. Authenticate with your API token
CustomGPT.api_key = "YOUR_API_TOKEN" # ← Replace with your actual token
Explanation:
- Step 1: Installs the customgpt-client package, giving you access to high‑level methods.
- Step 2: Imports CustomGPT, the class that wraps all API endpoints in Python.
- Step 3: Sets your api_key so each SDK call includes the correct authorization header.
Get API keys
To get your API key, there are two ways:
Method 1 – Via Agent
- Agent > All Agents.
- Select your agent and go to deploy, click on the API key section and create an API.
Method 2 – Via Profile section.
- Go to profile (top right corner of your screen)
- Click on My Profile
- You will see the screen something like this (below screenshot). Here you can click on “Create API key”, give it a name and copy the key.
Please save this secret key somewhere safe and accessible. For security reasons, You won’t be able to view it again through your CustomGPT.ai account. If you lose this secret key, you’ll need to generate a new one.
Now that your SDK is authenticated, we can point your project at a sitemap.
2. Point Your Project at a Sitemap
In this section, we’ll define the sitemap URL and call the add_sitemap endpoint to ingest it.
# 1. Specify the sitemap URL listing all pages to crawl
sitemap_url = "https://adorosario.github.io/small-sitemap.xml"
# 2. Attach the sitemap to your project for bulk indexing
sitemap_resp = CustomGPT.Project.add_sitemap(
project_id="YOUR_PROJECT_ID", # ← Replace with your project ID
sitemap_path=sitemap_url # ← Sitemap URL to ingest
)
# 3. Confirm the call succeeded
print("Add Sitemap Status:", sitemap_resp.status) # Expect "success"
Explanation:
- Define sitemap_url as the location of your site’s sitemap.xml.
- Project.add_sitemap sends a POST to /projects/{id}/sitemaps, triggering a bulk crawl.
- Check sitemap_resp.status to ensure the request was accepted before monitoring indexing.
With the sitemap attached, the agent begins crawling—next, let’s monitor its progress.
3. Monitor Indexing Progress
Crawling can take time depending on the sitemap size. Here’s how to poll your project’s stats until indexing completes:
import time # For adding delays between polls
# Poll until indexed pages match total pages
while True:
stats = CustomGPT.Project.stats(project_id="YOUR_PROJECT_ID").parsed.data
print(f"Indexed {stats.pages_indexed}/{stats.pages_found} pages")
if stats.pages_indexed >= stats.pages_found:
break
time.sleep(5) # Wait 5 seconds before checking again
print("✅ All sitemap pages indexed!")
Explanation:
- Project.stats calls GET /projects/{id}/stats and returns metrics such as pages_found and pages_indexed.
- We loop and sleep to avoid rate limits, breaking once indexing is complete.
Now that crawling is done, let’s verify the results in your CustomGPT dashboard.
4. Verify in the CustomGPT Dashboard
It’s always good to cross‑check the SDK’s output with the web interface:
- Log in at CustomGPT.
- Navigate to Agents → Your Project.
- Review stats on the project page—ensure the indexed page count matches what you saw in code.
Seeing the same numbers in the UI and your script means everything lined up correctly!
5. Troubleshooting Common Issues
- Invalid Project ID or Token
- Double‑check that project_id and CustomGPT.api_key are correct and active.
- Double‑check that project_id and CustomGPT.api_key are correct and active.
- Empty or Partial Crawls
- If pages_found is zero or less than expected, confirm your sitemap URL is valid and publicly accessible.
- If pages_found is zero or less than expected, confirm your sitemap URL is valid and publicly accessible.
- Rate Limiting
- If polling too frequently, increase the time.sleep() interval.
- If polling too frequently, increase the time.sleep() interval.
- Failed Pages
- The dashboard shows per‑page crawl status—fix or remove broken URLs in the sitemap.
Still stuck? The CustomGPT API docs and our GitHub Cookbook have more examples.
6. Conclusion
Congratulations! You now know how to:
- Install and authenticate the CustomGPT Python SDK.
- Attach a sitemap to an agent (project) for bulk ingestion.
- Monitor indexing progress programmatically.
- Verify results both in code and the web dashboard.
For further reading, check out our next tutorials on reindexing pages and listing all pages.
Happy coding!
Priyansh is Developer Relations Advocate who loves technology, writer about them, creates deeply researched content about them.