our agents were smart.
Now they have eyes.
With Vision & Image Citations, your CustomGPT.ai agent doesn’t just talk about things – it sees them, understands them, and shows them right in the conversation.
Upload any image.
Your agent gets it instantly. No setup. No coding.
Just drag, drop, done.
Start Chatting With Images Now
What can you do with this?
- Upload product photos – Your agent will show users the exact image while describing the product.
- Add technical diagrams – Your agent can now understand wiring schematics, assembly drawings, and flowcharts.
- Upload charts and graphs – Financial reports, sales data, performance metrics – your agent reads them, understands them, and cites them with the visual right there.
- Include screenshots of your software – Your agent shows users exactly which buttons to click with the screenshot appearing alongside the instructions.
- Add infographics and educational materials – Complex concepts become simple when your agent can show the visual while explaining.
The best part?
Uploaded images are automatically used as citations!
Customer asked about a product? The photo appears instantly. User describes a problem? Your agent shows them the exact image while explaining the fix.
Every photo you upload becomes a visual answer your agent can pull up the second someone needs it.
Plus, here’s where it gets really interesting
Enable Document Analyst (beta), and your users can upload their own images during conversations.
Imagine a furniture store agent where customers upload a photo of their living room and ask “what sofa would fit here?” – your agent analyzes their space and recommends products that match.
The possibilities are endless.
The Possibilities Are Endless
With AI Vision and Image Citations, CustomGPT.ai agents can now see, understand, and communicate visually — bringing conversations to life like never before.
Every image becomes part of your agent’s intelligence. Every visual becomes an answer. And every chat becomes more intuitive, transparent, and human. Ready to make your agents truly visual? Enable AI Vision and Image Citations and start chatting with your images.
Join the CustomGPT.ai Slack Community today!
FAQs
What is Vision Image Processing?
Vision Image Processing is a new feature that enables CustomGPT.ai agents to understand and process images uploaded to the platform. Unlike traditional OCR which only extracts text, this feature uses advanced AI vision capabilities to comprehend the full context of images, including diagrams, charts, schematics, and other visual content.
How does Vision Image Processing work?
When you upload images to your CustomGPT.ai agent, you can enable Vision Image Processing with a simple toggle. The system will then analyze the images using OpenAI’s vision capabilities, understanding both the visual elements and any text within them.
This processed information becomes part of your agent’s knowledge base, allowing it to reference and utilize this visual information when responding to user queries.
What types of images can be processed?
Vision Image Processing can handle virtually any type of visual content, including:
– Technical diagrams and schematics
– Charts and graphs
– Photographs
– Illustrations
– Handwritten text
– Screenshots
– Product images
– And more
What file formats are supported?
Currently, the feature supports JPEG, PNG, WEBP, and non-animated GIF formats.
Are there any limitations on image size?
Yes. Standard and Premium tier users can process images up to 1024×1024 pixels. Enterprise customers can request custom size limits that can be adjusted based on their specific needs.
Is Vision Image Processing available on all subscription tiers?
Yes, the feature is available on all subscription tiers. However, the number of images you can process per month will depend on your subscription level.
How does this differ from the existing OCR feature?
Vision Image Processing is an entirely new capability that goes beyond traditional OCR. While OCR only extracts text from images, Vision Image Processing understands the full context of the image, including text, visual elements, relationships between objects, and the overall meaning of the visual content.
Can I edit images after they’ve been processed?
Currently, you can only delete images after they’ve been processed.
How long are processed images stored?
Processed images follow the same data retention policies as your other agent data on CustomGPT.ai.
What are Image Citations?
Image Citations is a feature that automatically displays relevant images alongside your CustomGPT.ai agent’s responses. When your agent references information that was derived from an image in its knowledge base, that image will be displayed next to the relevant text, enhancing user understanding.
How do Image Citations work?
When your agent generates a response that references information from an image in its knowledge base, the system automatically identifies the relevant image and displays it alongside the text. This creates a more comprehensive and intuitive experience for users, particularly when dealing with technical or complex information.
Can I control which images appear as citations?
The system automatically determines which images to display based on the information being referenced in the response.
Will Image Citations work with all types of agents?
Yes, but they’re particularly valuable for agents trained on technical documentation, manuals, educational content, or any knowledge base where visual information enhances understanding.
How do Image Citations enhance the user experience?
Image Citations significantly improve comprehension by providing visual context alongside text explanations. This is especially valuable for:
– Technical troubleshooting where seeing a diagram is essential
– Educational content where visual aids enhance learning
– Product documentation where images clarify features or assembly
– Any scenario where “a picture is worth a thousand words”
Do Image Citations affect agent performance or speed?
Image Citations are designed to work seamlessly with minimal impact on performance. The system is optimized to display images efficiently without significantly affecting response times.
Is there an additional cost for these features?
No, both Vision Image Processing and Image Citations are included in your existing subscription plan at no additional cost. Usage limits will vary based on your subscription tier.
What are some ideal use cases for these features?
These features are particularly valuable for:
– Technical support agents that need to understand and reference diagrams
– Educational agents that benefit from visual aids
– Documentation agents for products with visual components
– Research assistants working with charts and graphs
– Any agent where visual information enhances understanding
How will these features integrate with the current data management system?
Initially, Vision Image Processing will integrate with the file upload system. You’ll see a new toggle option similar to the current OCR option when uploading files. Future updates may expand integration to other data sources.