Skip to main content
POST
/
media
/
completions
Media Completion
curl --request POST \
  --url https://api.animusai.co/v2/media/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "<string>"
          }
        }
      ]
    }
  ],
  "model": "animuslabs/Qwen2-VL-NSFW-Vision-1.1",
  "temperature": 0.1
}'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "content": "<string>",
        "tool_calls": [
          "<any>"
        ]
      },
      "logprobs": {},
      "finish_reason": "<string>",
      "stop_reason": "<string>"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "total_tokens": 123,
    "completion_tokens": 123
  },
  "prompt_logprobs": {}
}
The Media Completions endpoint allows you to generate detailed descriptions and responses based on images combined with text prompts. This endpoint uses vision-language models to analyze images and provide contextually relevant responses.

Key Features

  • Image Analysis: Process images from URLs or base64-encoded data
  • Vision-Language Models: Leverage advanced models like animuslabs/Qwen2-VL-NSFW-Vision-1.2
  • Flexible Input: Support for both remote image URLs and base64-encoded images
  • Contextual Responses: Generate detailed descriptions, answer questions about images, or perform image-based tasks

Request Examples

Using Image URL

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        },
        {
          "type": "text",
          "text": "Generate a detailed description for this image"
        }
      ]
    }
  ],
  "model": "animuslabs/Qwen2-VL-NSFW-Vision-1.2",
  "temperature": 0.1
}

Using Base64 Image

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "..."
          }
        }
      ]
    }
  ],
  "model": "animuslabs/Qwen2-VL-NSFW-Vision-1.2",
  "temperature": 0.1
}

Response Format

The response follows the standard chat completion format:
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "In this image, I can see...",
        "role": "assistant"
      }
    }
  ],
  "created": 1748436597,
  "usage": {
    "completion_tokens": 127,
    "prompt_tokens": 198,
    "total_tokens": 325
  }
}

Parameters

Required Parameters

  • messages: Array of message objects containing image and text content
  • model: The vision-language model to use (e.g., animuslabs/Qwen2-VL-NSFW-Vision-1.2)

Optional Parameters

  • temperature: Controls randomness in the response (default: 0.1)

Content Types

The content array in each message can contain:
  1. Text Content:
    {
      "type": "text",
      "text": "Your text prompt here"
    }
    
  2. Image URL Content:
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.jpg"
      }
    }
    
  3. Base64 Image Content:
    {
      "type": "image_url",
      "image_url": {
        "url": "data:image/jpeg;base64,<base64_string>"
      }
    }
    

Use Cases

  • Image Description: Generate detailed descriptions of images
  • Visual Question Answering: Ask specific questions about image content
  • Content Analysis: Analyze images for specific elements or themes
  • Creative Writing: Use images as inspiration for creative content
  • Accessibility: Create alt-text descriptions for images

Best Practices

  1. Image Quality: Use high-quality images for better analysis results
  2. Clear Prompts: Provide specific and clear text prompts for better responses
  3. Temperature Setting: Use lower temperature values (0.1-0.3) for more consistent, factual descriptions
  4. Model Selection: Choose the appropriate vision model based on your specific use case

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
messages
object[]
required
model
string
required

The vision-language model to use

Example:

"animuslabs/Qwen2-VL-NSFW-Vision-1.1"

temperature
number
default:0.1

Controls randomness in the response

Response

200 - application/json

The generated response for the image and text input

id
string
object
string
created
integer
model
string
choices
object[]
usage
object
prompt_logprobs
object | null
I