AI Vision:
Share and analyze
images in chat

Bring visuals into every chat with Vision Image Processing and Image Citations. Analyze, understand, and display images in text conversions.

AI Vision and Image Citations

Trusted by teams at 10,000+ organizations

Trusted by

Why visual understanding matter

Some of your most important information is stored in images, not text. Unlock all your knowledge, not just words with AI that can identify and analyze diagrams, charts, and illustrations, just like a human expert.

Smarter results, every time

Understands complex visuals like charts, schematics, and handwritten notes

Displays relevant images directly alongside agent responses

Enhances comprehension with clear visuals and text together

Support teams

Education & training teams

Product & documentation teams

Business impact of AI Vision

Core features of AI Vision

Visual understanding

Analyzes diagrams, charts, schematics, and photos to extract both meaning and context.

Contextual interpretation

Goes beyond text recognition to understand relationships between visual elements.

Seamless integration

Works automatically through the existing file upload system with a simple toggle.

Smart display

Shows referenced images directly alongside responses for richer, more intuitive conversations.

Why organizations choose CustomGPT.ai

"AI Vision was exactly what we needed, a way to share our visual data with ease."

by GAI Insights

Awarded Top 7 emerging leader in GenAI business solutions

Enterprise-grade
data security

img 3

Answers you trust

img 4

Plans & pricing

AI Vision is available across CustomGPT.ai plans.

Standard:
50 images/month

Premium:
200 images/month

Enterprise:
Starting at 1,000 images/month

Frequently asked questions

Vision Image Processing is a new feature that enables CustomGPT.ai agents to understand and process images uploaded to the platform. Unlike traditional OCR which only extracts text, this feature uses advanced AI vision capabilities to comprehend the full context of images, including diagrams, charts, schematics, and other visual content.

When you upload images to your CustomGPT.ai agent, you can enable Vision Image Processing with a simple toggle. The system will then analyze the images using OpenAI’s vision capabilities, understanding both the visual elements and any text within them. This processed information becomes part of your agent’s knowledge base, allowing it to reference and utilize this visual information when responding to user queries.

 

Vision Image Processing can handle virtually any type of visual content, including:

  • Technical diagrams and schematics
  • Charts and graphs
  • Photographs
  • Illustrations
  • Handwritten text
  • Screenshots
  • Product images
  • And more

Currently, the feature supports JPEG, PNG, WEBP, and non-animated GIF formats.

Yes. Standard and Premium tier users can process images up to 1024×1024 pixels. Enterprise customers can request custom size limits that can be adjusted based on their specific needs.

Yes, the feature is available on all subscription tiers. However, the number of images you can process per month will depend on your subscription level.

Vision Image Processing is an entirely new capability that goes beyond traditional OCR. While OCR only extracts text from images, Vision Image Processing understands the full context of the image, including text, visual elements, relationships between objects, and the overall meaning of the visual content.

Currently, you can only delete images after they’ve been processed.

Processed images follow the same data retention policies as your other agent data on CustomGPT.ai.

Image Citations is a feature that automatically displays relevant images alongside your CustomGPT.ai agent’s responses. When your agent references information that was derived from an image in its knowledge base, that image will be displayed next to the relevant text, enhancing user understanding.

When your agent generates a response that references information from an image in its knowledge base, the system automatically identifies the relevant image and displays it alongside the text. This creates a more comprehensive and intuitive experience for users, particularly when dealing with technical or complex information.

The system automatically determines which images to display based on the information being referenced in the response.

Image Citations are designed to work seamlessly with minimal impact on performance. The system is optimized to display images efficiently without significantly affecting response times.

Image Citations significantly improve comprehension by providing visual context alongside text explanations. This is especially valuable for:

  • Technical troubleshooting where seeing a diagram is essential
  • Educational content where visual aids enhance learning
  • Product documentation where images clarify features or assembly
  • Any scenario where “a picture is worth a thousand words”

Yes, but they’re particularly valuable for agents trained on technical documentation, manuals, educational content, or any knowledge base where visual information enhances understanding.

No, both Vision Image Processing and Image Citations are included in your existing subscription plan at no additional cost. Usage limits will vary based on your subscription tier.

These features are particularly valuable for:

  • Technical support agents that need to understand and reference diagrams
  • Educational agents that benefit from visual aids
  • Documentation agents for products with visual components
  • Research assistants working with charts and graphs
  • Any agent where visual information enhances understanding

Initially, Vision Image Processing will integrate with the file upload system. You’ll see a new toggle option similar to the current OCR option when uploading files. Future updates may expand integration to other data sources.

Bring visual intelligence to every conversation