Streaming Responses

Important: Streaming is automatically disabled when autoTurn (Conversational Turns) is enabled. The SDK will use non-streaming mode to properly handle response splitting and natural conversation flow. Choose either streaming OR conversational turns based on your use case.

Basic Streaming

Using AsyncIterable Pattern

Stream responses using the modern AsyncIterable pattern:

import { AnimusClient } from 'animus-client';

const client = new AnimusClient({
  tokenProviderUrl: 'https://your-backend.com/api/get-animus-token',
  chat: {
    model: 'vivian-llama3.1-70b-1.0-fp8',
    systemMessage: 'You are a helpful assistant.',
    // Note: autoTurn must be false or undefined for streaming to work
    autoTurn: false
  }
});

try {
  // Enable streaming in the request
  const stream = await client.chat.completions({
    messages: [
      { role: 'user', content: 'Write a short story about a robot learning to paint.' }
    ],
    stream: true
  });

  let fullContent = '';

  // Process each chunk as it arrives
  for await (const chunk of stream) {
    const delta = chunk.choices?.[0]?.delta?.content || '';
    fullContent += delta;
    
    // Update UI incrementally
    updateChatDisplay(fullContent);
    console.log('Streaming:', delta);
  }

  console.log('Stream complete. Final content:', fullContent);
} catch (error) {
  console.error('Streaming error:', error);
}

Real-time UI Updates

Here’s how to implement streaming in a web application:

// HTML element to display the streaming response
const responseElement = document.getElementById('ai-response');

async function streamResponse(userMessage: string) {
  try {
    const stream = await client.chat.completions({
      messages: [{ role: 'user', content: userMessage }],
      stream: true,
      temperature: 0.7
    });

    // Clear previous content
    responseElement.textContent = '';
    let accumulatedText = '';

    for await (const chunk of stream) {
      const delta = chunk.choices?.[0]?.delta?.content || '';
      
      if (delta) {
        accumulatedText += delta;
        responseElement.textContent = accumulatedText;
        
        // Auto-scroll to bottom
        responseElement.scrollTop = responseElement.scrollHeight;
      }
    }

    console.log('Streaming complete');
  } catch (error) {
    responseElement.textContent = 'Error: ' + error.message;
  }
}

Advanced Streaming Features

Streaming with Reasoning

When reasoning is enabled, thinking content appears directly in the stream:

const stream = await client.chat.completions({
  messages: [{ role: 'user', content: 'Solve this math problem: 2x + 5 = 15' }],
  stream: true,
  reasoning: true  // Include model's thinking process
});

let thinkingContent = '';
let responseContent = '';
let inThinkingBlock = false;

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta?.content || '';
  
  // Parse thinking blocks in real-time
  if (delta.includes('<think>')) {
    inThinkingBlock = true;
  }
  
  if (inThinkingBlock) {
    thinkingContent += delta;
    updateThinkingDisplay(thinkingContent);
  } else {
    responseContent += delta;
    updateResponseDisplay(responseContent);
  }
  
  if (delta.includes('</think>')) {
    inThinkingBlock = false;
  }
}

Streaming with Custom Processing

You can implement custom processing for different types of content:

class StreamProcessor {
  private buffer = '';
  private onText: (text: string) => void;
  private onCode: (code: string, language: string) => void;

  constructor(onText: (text: string) => void, onCode: (code: string, language: string) => void) {
    this.onText = onText;
    this.onCode = onCode;
  }

  processChunk(delta: string) {
    this.buffer += delta;
    
    // Detect code blocks
    const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
    let match;
    
    while ((match = codeBlockRegex.exec(this.buffer)) !== null) {
      const language = match[1] || 'text';
      const code = match[2];
      this.onCode(code, language);
    }
    
    // Process regular text
    const textContent = this.buffer.replace(codeBlockRegex, '');
    if (textContent.trim()) {
      this.onText(textContent);
    }
  }
}

// Usage
const processor = new StreamProcessor(
  (text) => updateTextDisplay(text),
  (code, lang) => updateCodeDisplay(code, lang)
);

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta?.content || '';
  processor.processChunk(delta);
}

Streaming Limitations

Compatibility with Other Features

Streaming is NOT compatible with:

autoTurn (Conversational Turns) - The SDK automatically disables streaming when autoTurn is enabled
compliance checks - Content moderation is not available for streaming responses

Streaming IS compatible with:

reasoning - Thinking content appears in the stream
check_image_generation - Image prompts can be detected in streaming responses
All standard chat parameters (temperature, max_tokens, etc.)

Error Handling

Implement robust error handling for streaming:

async function safeStreaming(userMessage: string) {
  try {
    const stream = await client.chat.completions({
      messages: [{ role: 'user', content: userMessage }],
      stream: true
    });

    for await (const chunk of stream) {
      try {
        const delta = chunk.choices?.[0]?.delta?.content || '';
        
        // Process chunk safely
        if (delta) {
          processStreamChunk(delta);
        }
      } catch (chunkError) {
        console.error('Error processing chunk:', chunkError);
        // Continue processing other chunks
      }
    }
  } catch (streamError) {
    console.error('Stream initialization error:', streamError);
    // Fallback to non-streaming
    const response = await client.chat.completions({
      messages: [{ role: 'user', content: userMessage }],
      stream: false
    });
    displayFinalResponse(response.choices[0].message.content);
  }
}

Performance Optimization

Throttling Updates

For better performance, throttle UI updates:

class ThrottledDisplay {
  private updateQueue = '';
  private isUpdating = false;
  private element: HTMLElement;

  constructor(element: HTMLElement) {
    this.element = element;
  }

  addContent(delta: string) {
    this.updateQueue += delta;
    
    if (!this.isUpdating) {
      this.isUpdating = true;
      requestAnimationFrame(() => {
        this.element.textContent += this.updateQueue;
        this.updateQueue = '';
        this.isUpdating = false;
      });
    }
  }
}

// Usage
const display = new ThrottledDisplay(document.getElementById('response'));

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta?.content || '';
  display.addContent(delta);
}

Buffering Strategy

Implement smart buffering for smoother display:

class StreamBuffer {
  private buffer: string[] = [];
  private displayInterval: number;

  constructor(onDisplay: (text: string) => void, intervalMs = 50) {
    this.displayInterval = setInterval(() => {
      if (this.buffer.length > 0) {
        const chunk = this.buffer.shift()!;
        onDisplay(chunk);
      }
    }, intervalMs);
  }

  addChunk(delta: string) {
    this.buffer.push(delta);
  }

  finish() {
    // Flush remaining buffer
    while (this.buffer.length > 0) {
      const chunk = this.buffer.shift()!;
      // Display immediately
    }
    clearInterval(this.displayInterval);
  }
}

Best Practices

When to Use Streaming vs Conversational Turns

Use Streaming for:

Long-form content generation (stories, articles, explanations)
Real-time code generation
When you want immediate character-by-character display
Applications where you control the entire response flow

Use Conversational Turns for:

Natural chat interfaces that mimic human conversation
When you want automatic response splitting
Applications that benefit from realistic typing delays
Chat apps where the AI should feel more human-like

UI/UX Considerations

Visual feedback: Show a typing indicator or cursor
Graceful degradation: Fallback to non-streaming on errors
Performance: Throttle updates for smooth rendering
Accessibility: Announce streaming status to screen readers

// Example with typing indicator
function showTypingIndicator() {
  const indicator = document.createElement('span');
  indicator.textContent = '▋';
  indicator.className = 'typing-cursor';
  responseElement.appendChild(indicator);
}

function hideTypingIndicator() {
  const cursor = responseElement.querySelector('.typing-cursor');
  if (cursor) cursor.remove();
}

Next Steps

Media & Vision

Add image and video analysis to your streaming responses

Auto-Turn Conversations

Learn about conversational turns (alternative to streaming)

Authentication Setup

Secure your streaming implementation

Event System

Comprehensive event handling for complex applications

Get Started

SDK Features

Advanced SDK

REST API Integration

Models

Basic Streaming

Using AsyncIterable Pattern

Real-time UI Updates

Advanced Streaming Features

Streaming with Reasoning

Streaming with Custom Processing

Streaming Limitations

Compatibility with Other Features

Error Handling

Performance Optimization

Throttling Updates

Buffering Strategy

Best Practices

When to Use Streaming vs Conversational Turns

UI/UX Considerations

Next Steps

Media & Vision

Auto-Turn Conversations

Authentication Setup

Event System

Get Started

SDK Features

Advanced SDK

REST API Integration

Models

​Basic Streaming

​Using AsyncIterable Pattern

​Real-time UI Updates

​Advanced Streaming Features

​Streaming with Reasoning

​Streaming with Custom Processing

​Streaming Limitations

​Compatibility with Other Features

​Error Handling

​Performance Optimization

​Throttling Updates

​Buffering Strategy

​Best Practices

​When to Use Streaming vs Conversational Turns

​UI/UX Considerations

​Next Steps

Media & Vision

Auto-Turn Conversations

Authentication Setup

Event System

Basic Streaming

Using AsyncIterable Pattern

Real-time UI Updates

Advanced Streaming Features

Streaming with Reasoning

Streaming with Custom Processing

Streaming Limitations

Compatibility with Other Features

Error Handling

Performance Optimization

Throttling Updates

Buffering Strategy

Best Practices

When to Use Streaming vs Conversational Turns

UI/UX Considerations

Next Steps