Vision Completions
Basic Image Analysis
Ask questions about images using the vision completion API:Multiple Images
You can analyze multiple images in a single request:Base64 Images
You can also use base64-encoded images:Media Analysis
Image Metadata Extraction
Extract structured metadata from images:Video Analysis
Analyze videos with automatic polling for results:Manual Status Checking
For more control over video analysis, you can check status manually:Advanced Vision Features
Conversational Vision
Build conversational interfaces around images:Vision with Custom Parameters
Fine-tune vision analysis with custom parameters:Supported Media Types
Image Formats
- JPEG/JPG - Standard photo format
- PNG - Images with transparency
- WebP - Modern web format
- GIF - Animated images (first frame analyzed)
- BMP - Bitmap images
Video Formats
- MP4 - Most common video format
- AVI - Audio Video Interleave
- MOV - QuickTime format
- WebM - Web-optimized format
- MKV - Matroska format
Size Limitations
- Images: Maximum 10MB per image
- Videos: Maximum 100MB per video
- Resolution: Up to 4K (3840x2160) for optimal performance