The SDK provides built-in content moderation capabilities that detect potentially harmful content and expose violation details through response data. How you handle these violations is entirely up to your application’s requirements and policies.

Quick Start

Enable content moderation by setting the compliance option to true:

import { AnimusClient } from 'animus-client';

const client = new AnimusClient({
  tokenProviderUrl: 'https://your-backend.com/api/get-animus-token',
  chat: {
    model: 'vivian-llama3.1-70b-1.0-fp8',
    systemMessage: 'You are a helpful assistant.',
    compliance: true  // Enable content moderation (default: true)
  }
});

// For event-driven chat, check violations in messageComplete event
client.on('messageComplete', (data) => {
  if (data.compliance_violations && data.compliance_violations.length > 0) {
    // Handle violations according to your application's needs
    console.log("Violations detected:", data.compliance_violations);
    // Your custom handling logic here
  } else {
    console.log("Content approved:", data.content);
  }
});

// Send message - compliance checking happens automatically
client.chat.send("User message to check");

// Or use direct API method for immediate response
const response = await client.chat.completions({
  messages: [{ role: 'user', content: 'User message to check' }],
  compliance: true
});

if (response.compliance_violations && response.compliance_violations.length > 0) {
  console.log("Violations detected:", response.compliance_violations);
} else {
  console.log("Content approved");
}

Performance Note: Enabling compliance checking adds a small amount of latency to requests as content is analyzed for violations. While minimal, consider this when designing real-time applications.

Configuration Options

Global Compliance Setting

Enable moderation for all chat interactions:

const client = new AnimusClient({
  tokenProviderUrl: 'https://your-backend.com/api/get-animus-token',
  chat: {
    model: 'vivian-llama3.1-70b-1.0-fp8',
    compliance: true  // All messages will be moderated
  }
});

Per-Message Compliance

Control moderation on a per-message basis:

const client = new AnimusClient({
  tokenProviderUrl: 'https://your-backend.com/api/get-animus-token',
  chat: {
    model: 'vivian-llama3.1-70b-1.0-fp8'
    // compliance not set globally
  }
});

// Enable moderation for specific messages using completions
const response = await client.chat.completions({
  messages: [{ role: 'user', content: 'Message to moderate' }],
  compliance: true
});

// Or disable for trusted content
const trustedResponse = await client.chat.completions({
  messages: [{ role: 'user', content: 'Safe admin message' }],
  compliance: false
});

Violation Detection

When compliance checking is enabled, the response includes violation information that your application can use:

const response = await client.chat.completions({
  messages: [{ role: 'user', content: 'User message' }],
  compliance: true
});

if (response.compliance_violations && response.compliance_violations.length > 0) {
  // Violations detected - handle according to your application's policy
  const violations = response.compliance_violations;
  
  // Examples of how you might handle violations:
  // - Block the message entirely
  // - Show a warning to the user
  // - Log for review
  // - Apply content filters
  // - Escalate to moderators
  
  handleViolations(violations, response.choices[0].message.content);
} else {
  // No violations detected - safe to display
  displayMessage(response.choices[0].message.content);
}

function handleViolations(violations: string[], content: string) {
  // Your application's violation handling logic
  console.log(`Content flagged for: ${violations.join(', ')}`);
  
  // Example handling strategies:
  if (violations.includes('drug_use')) {
    // Maybe just warn the user
    showWarning("Please avoid discussing drug use");
  }
  
  if (violations.some(v => ['pedophilia', 'rape', 'murder'].includes(v))) {
    // Maybe block completely and escalate
    blockContent();
    escalateToModerators(content, violations);
  }
}

Content Violation Categories

The system can detect and report the following categories of potentially harmful content:

CategoryDescription
pedophiliaContent related to sexual content involving minors
beastialityContent involving sexual acts with animals
murderContent that promotes or glorifies murder
rapeContent related to sexual assault
incestContent involving sexual relations between family members
goreExplicit and graphic violent content
prostitutionContent promoting or soliciting prostitution
drug_useContent promoting or describing drug use

Streaming with Moderation

Content moderation works with both streaming and non-streaming responses. For streaming responses, compliance violations are available in the chunk data:

// Streaming with compliance checking
const stream = await client.chat.completions({
  messages: [{ role: 'user', content: 'User message' }],
  compliance: true,
  stream: true
});

let fullContent = '';
let hasViolations = false;
let violations: string[] = [];

for await (const chunk of stream) {
  // Check for compliance violations in the chunk
  if (chunk.compliance_violations && chunk.compliance_violations.length > 0) {
    hasViolations = true;
    violations = chunk.compliance_violations;
    console.log('Compliance violations detected:', violations);
    
    // You can choose to:
    // - Stop streaming immediately
    // - Continue streaming but flag the content
    // - Handle violations after stream completes
    break; // Example: stop streaming on violation
  }
  
  // Process content delta
  const delta = chunk.choices?.[0]?.delta?.content || '';
  if (delta) {
    fullContent += delta;
    displayStreamingContent(delta);
  }
}

// Handle violations after streaming
if (hasViolations) {
  handleViolations(violations, fullContent);
} else {
  console.log('Content approved');
}

Non-Streaming with Moderation

For non-streaming responses, compliance violations are available in the response object:

// Non-streaming with compliance
const response = await client.chat.completions({
  messages: [{ role: 'user', content: 'User message' }],
  compliance: true,
  stream: false
});

if (response.compliance_violations && response.compliance_violations.length > 0) {
  console.log('Violations detected:', response.compliance_violations);
  handleViolations(response.compliance_violations, response.choices[0].message.content);
} else {
  console.log('Content approved');
  displayContent(response.choices[0].message.content);
}

Event-Driven Chat with Moderation

When using chat.send(), compliance violations are available in the messageComplete event:

client.on('messageComplete', (data) => {
  if (data.compliance_violations && data.compliance_violations.length > 0) {
    console.log('Violations detected:', data.compliance_violations);
    handleViolations(data.compliance_violations, data.content);
  } else {
    console.log('Content approved:', data.content);
    displayMessage(data.content);
  }
});

client.chat.send("User message to check");

Application Examples

Flexible Violation Handling

class ContentPolicy {
  // Define your application's content policy
  handleViolations(violations: string[], content: string, userId: string): PolicyAction {
    // Severe violations - immediate action
    if (violations.some(v => ['pedophilia', 'rape', 'murder'].includes(v))) {
      return {
        action: 'block',
        reason: 'Severe policy violation',
        escalate: true
      };
    }
    
    // Moderate violations - warning
    if (violations.includes('drug_use') || violations.includes('gore')) {
      return {
        action: 'warn',
        reason: 'Content may violate community guidelines',
        allowWithWarning: true
      };
    }
    
    // Minor violations - log only
    return {
      action: 'log',
      reason: 'Minor policy concern',
      allowContent: true
    };
  }
}

interface PolicyAction {
  action: 'block' | 'warn' | 'log' | 'allow';
  reason: string;
  escalate?: boolean;
  allowWithWarning?: boolean;
  allowContent?: boolean;
}

// Usage
const policy = new ContentPolicy();
const response = await client.chat.completions({
  messages: [{ role: 'user', content: userMessage }],
  compliance: true
});

if (response.compliance_violations?.length > 0) {
  const action = policy.handleViolations(
    response.compliance_violations, 
    response.choices[0].message.content, 
    userId
  );
  
  switch (action.action) {
    case 'block':
      blockMessage(action.reason);
      if (action.escalate) escalateToModerators();
      break;
    case 'warn':
      if (action.allowWithWarning) {
        showWarningAndDisplay(response.choices[0].message.content, action.reason);
      }
      break;
    case 'log':
      logViolation(response.compliance_violations);
      if (action.allowContent) displayMessage(response.choices[0].message.content);
      break;
  }
}

Educational Approach

// Example: Educational response to violations
function handleEducationalResponse(violations: string[]) {
  const educationalMessages = {
    'drug_use': 'We encourage discussions about health and wellness instead.',
    'gore': 'We prefer conversations that promote positive interactions.',
    'murder': 'Let\'s focus on constructive and peaceful topics.',
    // Add more educational responses
  };
  
  violations.forEach(violation => {
    const message = educationalMessages[violation];
    if (message) {
      showEducationalMessage(message);
    }
  });
}

Moderation Queue

// Example: Send flagged content to moderation queue
class ModerationQueue {
  async addToQueue(content: string, violations: string[], userId: string) {
    // Add to your moderation system
    await this.moderationService.queue({
      content,
      violations,
      userId,
      timestamp: new Date(),
      status: 'pending_review'
    });
    
    // Notify user
    this.notifyUser(userId, 'Your message is under review');
  }
  
  async handleModeratorDecision(messageId: string, decision: 'approve' | 'reject') {
    // Handle moderator's decision
    if (decision === 'approve') {
      this.releaseMessage(messageId);
    } else {
      this.notifyUserOfRejection(messageId);
    }
  }
}

Error Handling

Handle moderation failures gracefully:

async function safeModeration(message: string): Promise<ModerationResult> {
  try {
    const response = await client.chat.completions({
      messages: [{ role: 'user', content: message }],
      compliance: true
    });

    return {
      success: true,
      violations: response.compliance_violations || [],
      content: response.choices[0].message.content
    };

  } catch (error) {
    console.error('Moderation check failed:', error);
    
    // Decide how to handle moderation failures
    // Options:
    // - Block content (fail-safe)
    // - Allow content (fail-open)
    // - Queue for manual review
    
    return {
      success: false,
      violations: ['moderation_error'],
      error: 'Unable to check content safety'
    };
  }
}

interface ModerationResult {
  success: boolean;
  violations: string[];
  content?: string;
  error?: string;
}

Key Points

  • Violation Detection: The system detects and reports violations, but doesn’t automatically block content
  • Application Control: Your application decides how to handle each type of violation
  • Flexible Policies: Implement content policies that match your application’s needs
  • User Experience: Choose between blocking, warning, educating, or queuing for review
  • Escalation: Decide which violations require immediate action vs. review
  • Logging: Track violations for analysis and policy improvement

Next Steps