Some of your most important information is stored in images, not text. Unlock all your knowledge, not just words with AI that can identify and analyze diagrams, charts, and illustrations, just like a human expert.
Understands complex visuals like charts, schematics, and handwritten notes
Displays relevant images directly alongside agent responses
Enhances comprehension with clear visuals and text together
Analyzes diagrams, charts, schematics, and photos to extract both meaning and context.
Goes beyond text recognition to understand relationships between visual elements.
Works automatically through the existing file upload system with a simple toggle.
Shows referenced images directly alongside responses for richer, more intuitive conversations.
"AI Vision was exactly what we needed, a way to share our visual data with ease."
AI Vision is available across CustomGPT.ai plans.
Standard:
50 images/month
Premium:
200 images/month
Enterprise:
Starting at 1,000 images/month
Vision Image Processing is a new feature that enables CustomGPT.ai agents to understand and process images uploaded to the platform. Unlike traditional OCR which only extracts text, this feature uses advanced AI vision capabilities to comprehend the full context of images, including diagrams, charts, schematics, and other visual content.
When you upload images to your CustomGPT.ai agent, you can enable Vision Image Processing with a simple toggle. The system will then analyze the images using OpenAI’s vision capabilities, understanding both the visual elements and any text within them. This processed information becomes part of your agent’s knowledge base, allowing it to reference and utilize this visual information when responding to user queries.
Vision Image Processing can handle virtually any type of visual content, including:
Currently, the feature supports JPEG, PNG, WEBP, and non-animated GIF formats.
Yes. Standard and Premium tier users can process images up to 1024×1024 pixels. Enterprise customers can request custom size limits that can be adjusted based on their specific needs.
Yes, the feature is available on all subscription tiers. However, the number of images you can process per month will depend on your subscription level.
Vision Image Processing is an entirely new capability that goes beyond traditional OCR. While OCR only extracts text from images, Vision Image Processing understands the full context of the image, including text, visual elements, relationships between objects, and the overall meaning of the visual content.
Currently, you can only delete images after they’ve been processed.
Processed images follow the same data retention policies as your other agent data on CustomGPT.ai.
Image Citations is a feature that automatically displays relevant images alongside your CustomGPT.ai agent’s responses. When your agent references information that was derived from an image in its knowledge base, that image will be displayed next to the relevant text, enhancing user understanding.
When your agent generates a response that references information from an image in its knowledge base, the system automatically identifies the relevant image and displays it alongside the text. This creates a more comprehensive and intuitive experience for users, particularly when dealing with technical or complex information.
The system automatically determines which images to display based on the information being referenced in the response.
Image Citations are designed to work seamlessly with minimal impact on performance. The system is optimized to display images efficiently without significantly affecting response times.
Image Citations significantly improve comprehension by providing visual context alongside text explanations. This is especially valuable for:
Yes, but they’re particularly valuable for agents trained on technical documentation, manuals, educational content, or any knowledge base where visual information enhances understanding.
No, both Vision Image Processing and Image Citations are included in your existing subscription plan at no additional cost. Usage limits will vary based on your subscription tier.
These features are particularly valuable for:
Initially, Vision Image Processing will integrate with the file upload system. You’ll see a new toggle option similar to the current OCR option when uploading files. Future updates may expand integration to other data sources.