
Add a sitemap to an agent and ingest large websites by pointing your CustomGPT project at a sitemap URL. This guide covers:
- Installing & authenticating the CustomGPTs RAG APIs Python SDK.
- Pointing your project at a sitemap XML
- Monitoring bulk crawl & index progress
- Verifying in the dashboard
- Troubleshooting tips
New to CustomGPT? Sign up here and explore our API documentation.
Also check out the GitHub Cookbook example notebook.
Prerequisites
Before you begin, make sure you have:
- CustomGPT.ai Account & API Key
Create or sign in at CustomGPT and generate an API token under Settings → API Tokens. - Python 3 Environment
Use Google Colab, Jupyter Notebook, or your local Python 3 setup. - customgpt-client SDK installed
pip install customgpt-client- Existing Project ID
You can add a sitemap to an agent (project) or an existing one—just note its project_id. - Valid Sitemap URL
A link to a sitemap.xml, for example:
https://adorosario.github.io/small-sitemap.xml
With everything ready, let’s install and authenticate the SDK.
1. Install & Authenticate the SDK
First, install the Python SDK and configure it with your API key so all calls are properly authenticated.
# 1. Install the CustomGPT SDK from PyPI
!pip install customgpt-client
# 2. Import the SDK’s main interface
from customgpt_client import CustomGPT
# 3. Authenticate with your API token
CustomGPT.api_key = "YOUR_API_TOKEN" # ← Replace with your actual tokenExplanation:
- Step 1: Installs the customgpt-client package, giving you access to high‑level methods.
- Step 2: Imports CustomGPT, the class that wraps all API endpoints in Python.
- Step 3: Sets your api_key so each SDK call includes the correct authorization header.
Get API keys
To get your API key, there are two ways:
Method 1 – Via Agent
- Agent > All Agents.
- Select your agent and go to deploy, click on the API key section and create an API.
Method 2 – Via Profile section.
- Go to profile (top right corner of your screen)
- Click on My Profile
- You will see the screen something like this (below screenshot). Here you can click on “Create API key”, give it a name and copy the key.
Please save this secret key somewhere safe and accessible for future use with CustomGPT.ai integrations. For security reasons, You won’t be able to view it again through your CustomGPT.ai account. If you lose this secret key, you’ll need to generate a new one.
Now that your SDK is authenticated, we can point your project at an existing project sitemap.
2. Point Your Project at a Sitemap
In this section, we’ll define the sitemap URL and call the add_sitemap endpoint to ingest it.
# 1. Specify the sitemap URL listing all pages to crawl
sitemap_url = "https://adorosario.github.io/small-sitemap.xml"
# 2. Attach the sitemap to your project for bulk indexing
sitemap_resp = CustomGPT.Project.add_sitemap(
project_id="YOUR_PROJECT_ID", # ← Replace with your project ID
sitemap_path=sitemap_url # ← Sitemap URL to ingest
)
# 3. Confirm the call succeeded
print("Add Sitemap Status:", sitemap_resp.status) # Expect "success"Explanation:
- Define sitemap_url as the location of your site’s sitemap.xml.
- Project.add_sitemap sends a POST to /projects/{id}/sitemaps, triggering a bulk crawl.
- Check sitemap_resp.status to ensure the request was accepted before monitoring indexing.
With the sitemap attached, the agent begins crawling—next, let’s monitor its progress before deleting an agent page.
3. Monitor Indexing Progress
Crawling can take time depending on the sitemap size. Here’s how to poll your project’s stats until indexing completes:
import time # For adding delays between polls
# Poll until indexed pages match total pages
while True:
stats = CustomGPT.Project.stats(project_id="YOUR_PROJECT_ID").parsed.data
print(f"Indexed {stats.pages_indexed}/{stats.pages_found} pages")
if stats.pages_indexed >= stats.pages_found:
break
time.sleep(5) # Wait 5 seconds before checking again
print("✅ All sitemap pages indexed!")Explanation:
- Project.stats calls GET /projects/{id}/stats and returns metrics such as pages_found and pages_indexed.
- We loop and sleep to avoid rate limits, breaking once indexing is complete.
Now that crawling is done, let’s verify the results in your CustomGPT dashboard.
4. Verify in the CustomGPT Dashboard
It’s always good to cross‑check the SDK’s output with the web interface:
- Log in at CustomGPT.
- Navigate to Agents → Your Project.
- Review stats on the project page—ensure the indexed page count matches what you saw in code.
Seeing the same numbers in the UI and your script means everything lined up correctly!
5. Troubleshooting Common Issues
- Invalid Project ID or Token
- Double‑check that project_id and CustomGPT.api_key are correct and active.
- Double‑check that project_id and CustomGPT.api_key are correct and active.
- Empty or Partial Crawls
- If pages_found is zero or less than expected, confirm your sitemap URL is valid and publicly accessible.
- If pages_found is zero or less than expected, confirm your sitemap URL is valid and publicly accessible.
- Rate Limiting
- If polling too frequently, increase the time.sleep() interval.
- If polling too frequently, increase the time.sleep() interval.
- Failed Pages
- The dashboard shows per‑page crawl status—fix or remove broken URLs in the sitemap.
Still stuck? The CustomGPT API docs and our GitHub Cookbook have more examples.
6. Conclusion
Congratulations! You now know how to:
- Install and authenticate the CustomGPT Python SDK.
- Attach a sitemap to an agent (project) for bulk ingestion.
- Monitor indexing progress programmatically.
- Verify results both in code and the web dashboard.
For further reading, check out our next tutorials on reindexing pages and listing all pages.
Happy coding!
Related Resources
These guides expand on adding sitemap content and managing your CustomGPT.ai agent through the API.
- CustomGPT.ai Agent Stats — Learn how to retrieve usage and performance data from your CustomGPT.ai agent with the RAG API.
- Create Agent Via Sitemap — See how to build a new CustomGPT.ai conversational agent by supplying sitemap URLs through the API.
- SDK Agent Stats Guide — Follow this SDK-focused walkthrough to fetch CustomGPT.ai agent statistics programmatically.
Frequently Asked Questions
What does adding a sitemap to an AI agent actually do?
“Check out CustomGPT.ai where you can dump all your knowledge to automate proposals, customer inquiries and the knowledge base that exists in your head so your team can execute without you.” — Stephanie Warlick, Business Consultant. In practice, adding a sitemap lets you point an agent at a public sitemap XML so it can ingest many website pages into the project’s knowledge base in one step. That helps the agent ground answers in your site content rather than relying only on general model knowledge.
Can I use a custom sitemap URL instead of uploading pages one by one?
“I just discovered CustomGPT, and I am absolutely blown away by its capabilities and affordability! This powerful platform allows you to create custom GPT-4 chatbots using your own content, transforming customer service, engagement, and operational efficiency.” — Evan Weber, Digital Marketing Expert. Yes. You can use a valid public sitemap URL, such as a sitemap.xml, to ingest a whole site instead of adding pages individually. The main requirement is that the URL points to an actual XML sitemap the crawler can read.
How hard is it to automate sitemap ingestion with the Python SDK?
“We love CustomGPT.ai. It’s a fantastic Chat GPT tool kit that has allowed us to create a ‘lab’ for testing AI models. The results? High accuracy and efficiency leave people asking, ‘How did you do it?’ We’ve tested over 30 models with hundreds of iterations using CustomGPT.ai.” — Brendan McSheffrey, Managing Partner & Founder, The Kendall Project. For a basic automation flow, the steps are short: install the customgpt-client package, set your API key, and call Project.add_sitemap with your project_id and sitemap URL. After that, monitor the crawl and indexing progress until the job completes.
What should I verify in the dashboard after the sitemap finishes indexing?
After indexing finishes, confirm in the dashboard that the sitemap crawl completed for the correct project and that the website content you expected to ingest is present. A quick spot-check of a few important pages or URLs helps verify that the right source was added before you rely on the agent in production.
Why are some sitemap URLs missing or marked failed after ingestion?
Start with two checks: make sure the sitemap URL is valid and public, and make sure the indexing job has fully finished. If pages are still missing, confirm that those URLs are actually listed in the sitemap XML you submitted. When the source sitemap is incomplete or inaccessible, the agent cannot ingest every page you expected.
Do I need to re-add the sitemap every time my website changes?
Not necessarily. If the sitemap stays at the same public URL, you can keep using that same URL for future ingestion runs. If your site changes and you want the agent to reflect updated pages, run the sitemap ingestion again; you only need a different URL if the sitemap location itself changes.
Can I use this Python sitemap workflow if my app is in PHP or another language?
Yes. The sitemap walkthrough uses the Python SDK, but the platform also supports SDKs for PHP, Node.js, .NET, Java, Go, Ruby, and Swift, along with an OpenAI-compatible REST API at /v1/chat/completions. A common setup is to run sitemap ingestion from one supported environment and call the agent from your main application stack through the API.

Priyansh is Developer Relations Advocate who loves technology, writer about them, creates deeply researched content about them.