Important: Streaming is automatically disabled when autoTurn (Conversational Turns) is enabled. The SDK will use non-streaming mode to properly handle response splitting and natural conversation flow. Choose either streaming OR conversational turns based on your use case.

Basic Streaming

Using AsyncIterable Pattern

Stream responses using the modern AsyncIterable pattern:
import { AnimusClient } from 'animus-client';

const client = new AnimusClient({
  tokenProviderUrl: 'https://your-backend.com/api/get-animus-token',
  chat: {
    model: 'vivian-llama3.1-70b-1.0-fp8',
    systemMessage: 'You are a helpful assistant.',
    // Note: autoTurn must be false or undefined for streaming to work
    autoTurn: false
  }
});

try {
  // Enable streaming in the request
  const stream = await client.chat.completions({
    messages: [
      { role: 'user', content: 'Write a short story about a robot learning to paint.' }
    ],
    stream: true
  });

  let fullContent = '';

  // Process each chunk as it arrives
  for await (const chunk of stream) {
    const delta = chunk.choices?.[0]?.delta?.content || '';
    fullContent += delta;
    
    // Update UI incrementally
    updateChatDisplay(fullContent);
    console.log('Streaming:', delta);
  }

  console.log('Stream complete. Final content:', fullContent);
} catch (error) {
  console.error('Streaming error:', error);
}

Real-time UI Updates

Here’s how to implement streaming in a web application:
// HTML element to display the streaming response
const responseElement = document.getElementById('ai-response');

async function streamResponse(userMessage: string) {
  try {
    const stream = await client.chat.completions({
      messages: [{ role: 'user', content: userMessage }],
      stream: true,
      temperature: 0.7
    });

    // Clear previous content
    responseElement.textContent = '';
    let accumulatedText = '';

    for await (const chunk of stream) {
      const delta = chunk.choices?.[0]?.delta?.content || '';
      
      if (delta) {
        accumulatedText += delta;
        responseElement.textContent = accumulatedText;
        
        // Auto-scroll to bottom
        responseElement.scrollTop = responseElement.scrollHeight;
      }
    }

    console.log('Streaming complete');
  } catch (error) {
    responseElement.textContent = 'Error: ' + error.message;
  }
}

Advanced Streaming Features

Streaming with Reasoning

When reasoning is enabled, thinking content appears directly in the stream:
const stream = await client.chat.completions({
  messages: [{ role: 'user', content: 'Solve this math problem: 2x + 5 = 15' }],
  stream: true,
  reasoning: true  // Include model's thinking process
});

let thinkingContent = '';
let responseContent = '';
let inThinkingBlock = false;

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta?.content || '';
  
  // Parse thinking blocks in real-time
  if (delta.includes('<think>')) {
    inThinkingBlock = true;
  }
  
  if (inThinkingBlock) {
    thinkingContent += delta;
    updateThinkingDisplay(thinkingContent);
  } else {
    responseContent += delta;
    updateResponseDisplay(responseContent);
  }
  
  if (delta.includes('</think>')) {
    inThinkingBlock = false;
  }
}

Streaming with Custom Processing

You can implement custom processing for different types of content:
class StreamProcessor {
  private buffer = '';
  private onText: (text: string) => void;
  private onCode: (code: string, language: string) => void;

  constructor(onText: (text: string) => void, onCode: (code: string, language: string) => void) {
    this.onText = onText;
    this.onCode = onCode;
  }

  processChunk(delta: string) {
    this.buffer += delta;
    
    // Detect code blocks
    const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
    let match;
    
    while ((match = codeBlockRegex.exec(this.buffer)) !== null) {
      const language = match[1] || 'text';
      const code = match[2];
      this.onCode(code, language);
    }
    
    // Process regular text
    const textContent = this.buffer.replace(codeBlockRegex, '');
    if (textContent.trim()) {
      this.onText(textContent);
    }
  }
}

// Usage
const processor = new StreamProcessor(
  (text) => updateTextDisplay(text),
  (code, lang) => updateCodeDisplay(code, lang)
);

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta?.content || '';
  processor.processChunk(delta);
}

Streaming Limitations

Compatibility with Other Features

Streaming is NOT compatible with:
  • autoTurn (Conversational Turns) - The SDK automatically disables streaming when autoTurn is enabled
  • compliance checks - Content moderation is not available for streaming responses
Streaming IS compatible with:
  • reasoning - Thinking content appears in the stream
  • check_image_generation - Image prompts can be detected in streaming responses
  • All standard chat parameters (temperature, max_tokens, etc.)

Error Handling

Implement robust error handling for streaming:
async function safeStreaming(userMessage: string) {
  try {
    const stream = await client.chat.completions({
      messages: [{ role: 'user', content: userMessage }],
      stream: true
    });

    for await (const chunk of stream) {
      try {
        const delta = chunk.choices?.[0]?.delta?.content || '';
        
        // Process chunk safely
        if (delta) {
          processStreamChunk(delta);
        }
      } catch (chunkError) {
        console.error('Error processing chunk:', chunkError);
        // Continue processing other chunks
      }
    }
  } catch (streamError) {
    console.error('Stream initialization error:', streamError);
    // Fallback to non-streaming
    const response = await client.chat.completions({
      messages: [{ role: 'user', content: userMessage }],
      stream: false
    });
    displayFinalResponse(response.choices[0].message.content);
  }
}

Performance Optimization

Throttling Updates

For better performance, throttle UI updates:
class ThrottledDisplay {
  private updateQueue = '';
  private isUpdating = false;
  private element: HTMLElement;

  constructor(element: HTMLElement) {
    this.element = element;
  }

  addContent(delta: string) {
    this.updateQueue += delta;
    
    if (!this.isUpdating) {
      this.isUpdating = true;
      requestAnimationFrame(() => {
        this.element.textContent += this.updateQueue;
        this.updateQueue = '';
        this.isUpdating = false;
      });
    }
  }
}

// Usage
const display = new ThrottledDisplay(document.getElementById('response'));

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta?.content || '';
  display.addContent(delta);
}

Buffering Strategy

Implement smart buffering for smoother display:
class StreamBuffer {
  private buffer: string[] = [];
  private displayInterval: number;

  constructor(onDisplay: (text: string) => void, intervalMs = 50) {
    this.displayInterval = setInterval(() => {
      if (this.buffer.length > 0) {
        const chunk = this.buffer.shift()!;
        onDisplay(chunk);
      }
    }, intervalMs);
  }

  addChunk(delta: string) {
    this.buffer.push(delta);
  }

  finish() {
    // Flush remaining buffer
    while (this.buffer.length > 0) {
      const chunk = this.buffer.shift()!;
      // Display immediately
    }
    clearInterval(this.displayInterval);
  }
}

Best Practices

When to Use Streaming vs Conversational Turns

Use Streaming for:
  • Long-form content generation (stories, articles, explanations)
  • Real-time code generation
  • When you want immediate character-by-character display
  • Applications where you control the entire response flow
Use Conversational Turns for:
  • Natural chat interfaces that mimic human conversation
  • When you want automatic response splitting
  • Applications that benefit from realistic typing delays
  • Chat apps where the AI should feel more human-like

UI/UX Considerations

  • Visual feedback: Show a typing indicator or cursor
  • Graceful degradation: Fallback to non-streaming on errors
  • Performance: Throttle updates for smooth rendering
  • Accessibility: Announce streaming status to screen readers
// Example with typing indicator
function showTypingIndicator() {
  const indicator = document.createElement('span');
  indicator.textContent = '▋';
  indicator.className = 'typing-cursor';
  responseElement.appendChild(indicator);
}

function hideTypingIndicator() {
  const cursor = responseElement.querySelector('.typing-cursor');
  if (cursor) cursor.remove();
}

Next Steps