Skip to main content

Vision Completions

Basic Image Analysis

Ask questions about images using the vision completion API:
import { AnimusClient, MediaMessage } from 'animus-client';

const client = new AnimusClient({
  tokenProviderUrl: 'https://your-backend.com/api/get-animus-token',
  vision: {
    model: 'animuslabs/Qwen2-VL-NSFW-Vision-1.2',
    temperature: 0.2
  }
});

// Analyze an image with a question
const visionMessages: MediaMessage[] = [
  {
    role: 'user',
    content: [
      { type: 'text', text: 'What do you see in this image? Describe it in detail.' },
      { type: 'image_url', image_url: { url: 'https://example.com/image.jpg' } }
    ]
  }
];

const response = await client.media.completions({
  messages: visionMessages
});

console.log('Vision Analysis:', response.choices[0].message.content);

Multiple Images

You can analyze multiple images in a single request:
const multiImageMessages: MediaMessage[] = [
  {
    role: 'user',
    content: [
      { type: 'text', text: 'Compare these two images. What are the differences?' },
      { type: 'image_url', image_url: { url: 'https://example.com/image1.jpg' } },
      { type: 'image_url', image_url: { url: 'https://example.com/image2.jpg' } }
    ]
  }
];

const comparison = await client.media.completions({
  messages: multiImageMessages,
  temperature: 0.3
});

console.log('Image Comparison:', comparison.choices[0].message.content);

Base64 Images

You can also use base64-encoded images:
// Convert file to base64
function fileToBase64(file: File): Promise<string> {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result as string);
    reader.onerror = reject;
    reader.readAsDataURL(file);
  });
}

// Use with vision API
const fileInput = document.getElementById('imageInput') as HTMLInputElement;
const file = fileInput.files?.[0];

if (file) {
  const base64Image = await fileToBase64(file);
  
  const messages: MediaMessage[] = [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Analyze this uploaded image.' },
        { type: 'image_url', image_url: { url: base64Image } }
      ]
    }
  ];

  const response = await client.media.completions({ messages });
}

Media Analysis

Image Metadata Extraction

Extract structured metadata from images:
// Analyze image for categories and tags
const imageAnalysis = await client.media.analyze({
  media_url: 'https://example.com/photo.jpg',
  metadata: ['categories', 'tags', 'objects', 'faces']
});

console.log('Categories:', imageAnalysis.metadata?.categories);
console.log('Tags:', imageAnalysis.metadata?.tags);
console.log('Objects detected:', imageAnalysis.metadata?.objects);
console.log('Faces detected:', imageAnalysis.metadata?.faces);

Video Analysis

Analyze videos with automatic polling for results:
// Start video analysis (this will poll until complete)
console.log('Starting video analysis...');

const videoAnalysis = await client.media.analyze({
  media_url: 'https://example.com/video.mp4',
  metadata: ['actions', 'scene', 'objects', 'categories']
});

console.log('Video analysis complete!');
console.log('Actions detected:', videoAnalysis.results?.[0]?.actions);
console.log('Scene analysis:', videoAnalysis.results?.[0]?.scene);
console.log('Objects in video:', videoAnalysis.results?.[0]?.objects);

Manual Status Checking

For more control over video analysis, you can check status manually:
// Start analysis without waiting
const analysisRequest = await client.media.analyze({
  media_url: 'https://example.com/long-video.mp4',
  metadata: ['actions', 'scene'],
  wait_for_completion: false // Don't wait automatically
});

const jobId = analysisRequest.job_id;

// Check status periodically
const checkStatus = async () => {
  const status = await client.media.getAnalysisStatus(jobId);
  
  console.log(`Job ${jobId}: ${status.status} (${status.percent_complete}% complete)`);
  
  if (status.status === 'completed') {
    console.log('Analysis results:', status.results);
    return status.results;
  } else if (status.status === 'failed') {
    console.error('Analysis failed:', status.error);
    return null;
  } else {
    // Still processing, check again in 5 seconds
    setTimeout(checkStatus, 5000);
  }
};

checkStatus();

Advanced Vision Features

Conversational Vision

Build conversational interfaces around images:
// Initialize with vision configuration
const client = new AnimusClient({
  tokenProviderUrl: 'your-token-url',
  vision: {
    model: 'animuslabs/Qwen2-VL-NSFW-Vision-1.2'
  }
});

// Start a conversation about an image
let conversation: MediaMessage[] = [
  {
    role: 'user',
    content: [
      { type: 'text', text: 'What do you see in this image?' },
      { type: 'image_url', image_url: { url: 'https://example.com/artwork.jpg' } }
    ]
  }
];

const firstResponse = await client.media.completions({
  messages: conversation
});

// Add the response to conversation
conversation.push({
  role: 'assistant',
  content: firstResponse.choices[0].message.content
});

// Continue the conversation
conversation.push({
  role: 'user',
  content: [
    { type: 'text', text: 'What style of art is this? Who might have painted it?' }
  ]
});

const secondResponse = await client.media.completions({
  messages: conversation
});

console.log('Art analysis:', secondResponse.choices[0].message.content);

Vision with Custom Parameters

Fine-tune vision analysis with custom parameters:
const detailedAnalysis = await client.media.completions({
  messages: visionMessages,
  temperature: 0.1,        // More focused responses
  max_tokens: 1000,        // Longer descriptions
  top_p: 0.9              // Nucleus sampling
});

Supported Media Types

Image Formats

  • JPEG/JPG - Standard photo format
  • PNG - Images with transparency
  • WebP - Modern web format
  • GIF - Animated images (first frame analyzed)
  • BMP - Bitmap images

Video Formats

  • MP4 - Most common video format
  • AVI - Audio Video Interleave
  • MOV - QuickTime format
  • WebM - Web-optimized format
  • MKV - Matroska format

Size Limitations

  • Images: Maximum 10MB per image
  • Videos: Maximum 100MB per video
  • Resolution: Up to 4K (3840x2160) for optimal performance

Error Handling

Handle media-specific errors gracefully:
import { ApiError, AuthenticationError } from 'animus-client';

try {
  const analysis = await client.media.analyze({
    media_url: 'https://example.com/image.jpg',
    metadata: ['categories']
  });
} catch (error) {
  if (error instanceof ApiError) {
    if (error.status === 400) {
      console.error('Invalid media URL or format');
    } else if (error.status === 413) {
      console.error('Media file too large');
    } else if (error.status === 422) {
      console.error('Unsupported media format');
    } else {
      console.error(`API Error (${error.status}):`, error.message);
    }
  } else if (error instanceof AuthenticationError) {
    console.error('Authentication failed:', error.message);
  } else {
    console.error('Unexpected error:', error);
  }
}

Best Practices

Performance Optimization

// For better performance, resize large images before analysis
function resizeImage(file: File, maxWidth: number = 1920): Promise<string> {
  return new Promise((resolve) => {
    const canvas = document.createElement('canvas');
    const ctx = canvas.getContext('2d')!;
    const img = new Image();
    
    img.onload = () => {
      const ratio = Math.min(maxWidth / img.width, maxWidth / img.height);
      canvas.width = img.width * ratio;
      canvas.height = img.height * ratio;
      
      ctx.drawImage(img, 0, 0, canvas.width, canvas.height);
      resolve(canvas.toDataURL('image/jpeg', 0.8));
    };
    
    img.src = URL.createObjectURL(file);
  });
}

Batch Processing

// Process multiple images efficiently
async function batchAnalyzeImages(imageUrls: string[]) {
  const results = await Promise.allSettled(
    imageUrls.map(url => 
      client.media.analyze({
        media_url: url,
        metadata: ['categories', 'tags']
      })
    )
  );

  return results.map((result, index) => ({
    url: imageUrls[index],
    success: result.status === 'fulfilled',
    data: result.status === 'fulfilled' ? result.value : null,
    error: result.status === 'rejected' ? result.reason : null
  }));
}

Caching Results

// Cache analysis results to avoid re-processing
class MediaAnalysisCache {
  private cache = new Map<string, any>();

  async analyze(url: string, metadata: string[]) {
    const cacheKey = `${url}-${metadata.join(',')}`;
    
    if (this.cache.has(cacheKey)) {
      return this.cache.get(cacheKey);
    }

    const result = await client.media.analyze({
      media_url: url,
      metadata
    });

    this.cache.set(cacheKey, result);
    return result;
  }
}

Integration Examples

Image Upload and Analysis

// Complete image upload and analysis workflow
async function handleImageUpload(file: File) {
  try {
    // Show loading state
    showLoadingIndicator();

    // Convert to base64 for analysis
    const base64Image = await fileToBase64(file);

    // Analyze the image
    const [visionResponse, metadataResponse] = await Promise.all([
      client.media.completions({
        messages: [{
          role: 'user',
          content: [
            { type: 'text', text: 'Describe this image in detail.' },
            { type: 'image_url', image_url: { url: base64Image } }
          ]
        }]
      }),
      client.media.analyze({
        media_url: base64Image,
        metadata: ['categories', 'tags', 'objects']
      })
    ]);

    // Display results
    displayAnalysisResults({
      description: visionResponse.choices[0].message.content,
      categories: metadataResponse.metadata?.categories,
      tags: metadataResponse.metadata?.tags,
      objects: metadataResponse.metadata?.objects
    });

  } catch (error) {
    showErrorMessage('Failed to analyze image: ' + error.message);
  } finally {
    hideLoadingIndicator();
  }
}

Next Steps

Authentication Setup

Set up secure token provider for media analysis

Tool Calling

Combine vision with function calling capabilities

Image Generation

Generate images automatically from conversations

Event System

Handle media analysis events in your application