API REFERENCE

Content Moderation AI

Analyze text content for policy violations using Nova's advanced content moderation model. Get structured safety assessments with confidence scores, severity levels, and actionable recommendations.

Download README
POST https://api.lumyx-ai.site/v1/moderation
Dedicated Endpoint
This endpoint uses a specialized content moderation model that is not available through the regular /v1/chat/completions endpoint. It is optimized for fast, accurate safety assessments.

Overview

The Content Moderation API evaluates text content against a comprehensive set of safety policies. It returns structured verdicts that include:

  • Status - Whether content is ALLOWED or DISALLOWED
  • Category - Classification (SAFE, PROFANITY, SPAM, HARASSMENT, PII, SEXUAL, HATE_SPEECH, SELF_HARM, VIOLENCE, ILLEGAL, EXTREMISM, CHILD_SAFETY)
  • Severity - Risk level (LOW, MEDIUM, HIGH, CRITICAL)
  • Action - Recommended action (ALLOW, BLOCK, REDACT, ESCALATE, LOG_ONLY, BANNED)
  • Confidence - Model confidence score (0.00 - 1.00)
  • Detected Signals - Specific policy triggers found

Request Format

Headers

Authorization required string
Bearer token with your API key: Bearer YOUR_API_KEY
Content-Type required string
application/json

Body Parameters

content required string
The text content to analyze for policy violations. Maximum 50,000 characters.
policy optional string
Custom moderation policies to apply. These are appended to the model's base rules. Use format like #policy 1: description. Maximum 10,000 characters.
instructions optional string
Additional instructions for this request (e.g., "max ban 10 days", "be lenient with warnings"). Maximum 2,000 characters.

Response Format

The API returns a JSON object with the moderation assessment:

Field Type Description
status string Overall verdict: ALLOWED or DISALLOWED
category string Content category: SAFE, PROFANITY, SPAM, HARASSMENT, PII, SEXUAL, HATE_SPEECH, SELF_HARM, VIOLENCE, ILLEGAL, EXTREMISM, CHILD_SAFETY
severity string Risk level: LOW, MEDIUM, HIGH, CRITICAL
action string Recommended action: ALLOW, BLOCK, REDACT, ESCALATE, LOG_ONLY, BANNED
banned_days integer|null Suggested ban duration if applicable (e.g., 365 for zero-tolerance)
reason string Human-readable explanation of the decision
confidence float Model confidence score (0.00 - 1.00)
uncertainty_flag boolean True if the model is uncertain about the assessment
ambiguity_reason string Reason for uncertainty: NONE, CONTEXT_MISSING, SATIRE, QUOTED_CONTENT, MIXED_SIGNALS
escalation_required boolean True if human review is strictly recommended
auto_fail boolean True if content triggered zero-tolerance policy
detected_signals array List of specific policy triggers found (e.g. THREAT, EXTREMISM)
policy_version string Version of the policy used for assessment
timestamp string ISO 8601 timestamp of the assessment
usage object Token usage: prompt_tokens, completion_tokens, total_tokens

Escalation Rules

The model applies strict rules to determine severity and actions. Understanding these rules helps interpret the results.

1. Zero-Tolerance Escalation (AutoFail)
  • Child Safety: Any sexual exploitation of minors triggers CRITICAL severity, BANNED action, and 365-day ban.
  • Terrorism / Extremism: Credible threats or detailed violence planning trigger CRITICAL severity and immediate ban.
2. Cumulative Signals

If 3 or more minor risk signals (e.g., mild threats, spam, profanity) appear together:

  • Severity escalates by one level (e.g., LOW → MEDIUM).
  • If severity reaches CRITICAL, the action defaults to BANNED.
3. Confidence Logic
  • Zero-Tolerance: Confidence is always 1.00.
  • Ambiguity: If conflicting signals exist, uncertainty_flag is set to true and confidence is lowered.

Examples

Basic Request (Content Only)

Simple moderation using the model's default policies:

cURL - Basic
curl -X POST https://api.lumyx-ai.site/v1/moderation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello, how are you doing today?"
  }'
Response (Safe Content)
{
  "status": "ALLOWED",
  "category": "SAFE",
  "severity": "LOW",
  "action": "ALLOW",
  "banned_days": null,
  "reason": "No policy violation detected",
  "confidence": 1.00,
  "uncertainty_flag": false,
  "escalation_required": false,
  "detected_signals": []
}

Custom Policies & Instructions

Add your own moderation rules and instructions per-request:

cURL - With Custom Policies
curl -X POST https://api.lumyx-ai.site/v1/moderation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "You are so stupid!",
    "policy": "#policy 1: block all insults and name-calling\n#policy 2: ban 7 days for slurs\n#policy 3: warn for mild rudeness",
    "instructions": "max ban should be 10 days, be strict with insults"
  }'
Response (Blocked by Custom Policy)
{
  "status": "DISALLOWED",
  "category": "HARASSMENT",
  "severity": "LOW",
  "action": "BLOCK",
  "banned_days": null,
  "reason": "Violates custom policy to block insults",
  "confidence": 1.00,
  "uncertainty_flag": false,
  "escalation_required": false,
  "detected_signals": ["HARASSMENT"]
}

Using Our Built-in Policies & Instructions

Nova's default moderation model detects harmful content with built-in policies and instructions key:

cURL - Built-in Detection
curl -X POST https://api.lumyx-ai.site/v1/moderation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "I will hurt you badly",
    "instructions": "prioritize safety over freedom of speech"
  }'
Response (Detected by Built-in Policies)
{
  "status": "DISALLOWED",
  "category": "VIOLENCE",
  "severity": "HIGH",
  "action": "BLOCK",
  "banned_days": 7,
  "reason": "Threatening violence against another person",
  "confidence": 0.98,
  "uncertainty_flag": false,
  "escalation_required": false,
  "auto_fail": true,
  "detected_signals": ["VIOLENCE", "THREAT"]
}

Python

Python
import requests

url = "https://api.lumyx-ai.site/v1/moderation"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Example with custom policies
data = {
    "content": "Text to moderate goes here...",
    "policy": "#policy 1: block all spam\n#policy 2: be strict with harassment",
    "instructions": "prioritize safety over freedom of speech"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

if result.get("status") == "ALLOWED":
    print(f"✅ Content is safe (Confidence: {result.get('confidence')})")
else:
    print(f"❌ Content blocked!")
    print(f"Reason: {result.get('reason')}")
    print(f"Category: {result.get('category')}")
    print(f"Action: {result.get('action')}")

JavaScript / Node.js

JavaScript
const moderateContent = async () => {
  const response = await fetch('https://api.lumyx-ai.site/v1/moderation', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      content: 'Text to moderate goes here...',
      policy: '#policy 1: block all spam\n#policy 2: be strict with harassment',
      instructions: 'prioritize safety over freedom of speech'
    })
  });

  const result = await response.json();

  if (result.status === 'ALLOWED') {
    console.log(`✅ Content is safe (Confidence: ${result.confidence})`);
  } else {
    console.log(`❌ Content blocked!`);
    console.log(`Reason: ${result.reason}`);
    console.log(`Category: ${result.category}`);
    console.log(`Action: ${result.action}`);
  }
};

moderateContent();

Integration & Bots

Looking for a drop-in solution? Check out our official open-source bots built on top of the Lumyx AI Content Moderation API. These reference implementations are ready to deploy for Discord and Telegram.

Use Cases

🗨️ Chat & Messaging Apps
Screen user messages in real-time before displaying to prevent harassment, hate speech, and harmful content in chat rooms and DMs.
🤖 AI Chatbot Safety
Filter both user inputs and AI-generated responses to ensure your chatbot doesn't process or produce harmful content.
📝 User-Generated Content
Review posts, comments, reviews, and profiles before publishing to maintain community standards on forums and social platforms.
🎮 Gaming Platforms
Moderate in-game chat, usernames, clan names, and custom content to keep gaming communities safe and inclusive.
📧 Email & Support Systems
Screen incoming support tickets and emails for abusive language before routing to human agents.
📱 Dating & Social Apps
Review profile bios, messages, and user introductions to reduce harassment and inappropriate content.
🏢 Enterprise Compliance
Monitor internal communications for policy violations, harassment, or sensitive data exposure with audit logging.
🛒 E-commerce Reviews
Filter product reviews and Q&A sections for spam, fake reviews, and inappropriate content before publishing.
👶 Kids & Education Apps
Apply strict content filtering with custom child-safety policies for educational platforms and apps for minors.
📊 Analytics & Reporting
Track moderation trends, flag rates, and policy violations over time for content strategy optimization.

Error Handling

400 Bad Request invalid_request_error
Missing or invalid content parameter
401 Unauthorized authentication_error
Invalid or missing API key
503 Service Unavailable service_error
Content moderation model not available
Best Practices
Always check the uncertainty_flag and escalation_required fields. When either is true, consider routing the content for human review rather than making an automated decision.