Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
A chronological list of messages that compose the current conversation.
Adjusts randomness in the response generation, with lower values yielding more predictable responses.
Filters the token set to those with cumulative probability above this threshold, influencing diversity.
Number of alternate responses to generate.
Caps the number of tokens in the generated response. Lacks a default to allow model-specific limits.
A set of strings which, when generated, signal the model to cease response generation.
If set to true, the response is streamed back to the client as it's being generated.
Adjusts likelihood of new words based on their existing presence in the text. Discourages repetition when positive.
Penalizes words based on their frequency in the document to encourage diversity.
Generates several completions server-side and returns the best. The definition of "best" depends on model and settings.
Limits consideration to the top k tokens, diversifying outputs by reducing predictability.
Modifies likelihood of repeating tokens based on their previous occurrence, counteracting model's repetition tendency.
Sets a minimum probability threshold for tokens to be considered for generation, further filtering the possible outputs.
Adjusts the impact of sequence length on selection, encouraging shorter or longer responses.
When true, verifies if the response contains any violations such as inappropriate content.
The model to use for generating the response.
"animuslabs/Vivian-llama3.1-70b-1.0-fp8"
When true, enables reasoning/thinking content from the model. For non-streaming responses, adds a reasoning field. For streaming, thinking content appears in the stream.
⚠️ ALPHA FEATURE: When true, the AI analyzes its response and creates an image_prompt field if an image is requested or desired. This feature is in alpha state and not recommended for production use.
A list of tools the model may call. Currently, only functions are supported as a tool.
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools.
none
, auto
When true, splits the API response into individual conversational turns returned as an array called 'turns'.
Response
The generated response along with relevant metadata.
Unique identifier for the chat completion provided by the system.
Type indicator for the object, typically 'chat.completion'.
Unix timestamp when the response was created.
Contains the generated responses, along with meta-information about each.
Identifier for the model used in generating the response.
List of detected content violations when compliance checking is enabled.
["drug_use"]