Generates a response based on provided images and text using a vision-language model.
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
The generated response for the image and text input
The response is of type object
.