API REFERENCE

Content Moderation AI

Analyze text content for policy violations using Nova's advanced content moderation model. Get structured safety assessments with confidence scores, severity levels, and actionable recommendations.

Download README

POST https://api.lumyx-ai.site/v1/moderation

Dedicated Endpoint

This endpoint uses a specialized content moderation model that is not available through the regular /v1/chat/completions endpoint. It is optimized for fast, accurate safety assessments.

Overview

The Content Moderation API evaluates text content against a comprehensive set of safety policies. It returns structured verdicts that include:

Status - Whether content is ALLOWED or DISALLOWED
Category - Classification (SAFE, PROFANITY, SPAM, HARASSMENT, PII, SEXUAL, HATE_SPEECH, SELF_HARM, VIOLENCE, ILLEGAL, EXTREMISM, CHILD_SAFETY)
Severity - Risk level (LOW, MEDIUM, HIGH, CRITICAL)
Action - Recommended action (ALLOW, BLOCK, REDACT, ESCALATE, LOG_ONLY, BANNED)
Confidence - Model confidence score (0.00 - 1.00)
Detected Signals - Specific policy triggers found

Request Format

Headers

Authorization required string

Bearer token with your API key: Bearer YOUR_API_KEY

Content-Type required string

application/json

Body Parameters

content required string

The text content to analyze for policy violations. Maximum 50,000 characters.

policy optional string

Custom moderation policies to apply. These are appended to the model's base rules. Use format like #policy 1: description. Maximum 10,000 characters.

instructions optional string

Additional instructions for this request (e.g., "max ban 10 days", "be lenient with warnings"). Maximum 2,000 characters.

Response Format

The API returns a JSON object with the moderation assessment:

Field	Type	Description
`status`	string	Overall verdict: `ALLOWED` or `DISALLOWED`
`category`	string	Content category: `SAFE`, `PROFANITY`, `SPAM`, `HARASSMENT`, `PII`, `SEXUAL`, `HATE_SPEECH`, `SELF_HARM`, `VIOLENCE`, `ILLEGAL`, `EXTREMISM`, `CHILD_SAFETY`
`severity`	string	Risk level: `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`
`action`	string	Recommended action: `ALLOW`, `BLOCK`, `REDACT`, `ESCALATE`, `LOG_ONLY`, `BANNED`
`banned_days`	integer\|null	Suggested ban duration if applicable (e.g., 365 for zero-tolerance)
`reason`	string	Human-readable explanation of the decision
`confidence`	float	Model confidence score (0.00 - 1.00)
`uncertainty_flag`	boolean	True if the model is uncertain about the assessment
`ambiguity_reason`	string	Reason for uncertainty: `NONE`, `CONTEXT_MISSING`, `SATIRE`, `QUOTED_CONTENT`, `MIXED_SIGNALS`
`escalation_required`	boolean	True if human review is strictly recommended
`auto_fail`	boolean	True if content triggered zero-tolerance policy
`detected_signals`	array	List of specific policy triggers found (e.g. `THREAT`, `EXTREMISM`)
`policy_version`	string	Version of the policy used for assessment
`timestamp`	string	ISO 8601 timestamp of the assessment
`usage`	object	Token usage: `prompt_tokens`, `completion_tokens`, `total_tokens`

Escalation Rules

The model applies strict rules to determine severity and actions. Understanding these rules helps interpret the results.

1. Zero-Tolerance Escalation (AutoFail)

Child Safety: Any sexual exploitation of minors triggers CRITICAL severity, BANNED action, and 365-day ban.
Terrorism / Extremism: Credible threats or detailed violence planning trigger CRITICAL severity and immediate ban.

2. Cumulative Signals

If 3 or more minor risk signals (e.g., mild threats, spam, profanity) appear together:

Severity escalates by one level (e.g., LOW → MEDIUM).
If severity reaches CRITICAL, the action defaults to BANNED.

3. Confidence Logic

Zero-Tolerance: Confidence is always 1.00.
Ambiguity: If conflicting signals exist, uncertainty_flag is set to true and confidence is lowered.

Examples

Basic Request (Content Only)

Simple moderation using the model's default policies:

cURL - Basic

curl -X POST https://api.lumyx-ai.site/v1/moderation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello, how are you doing today?"
  }'

                Response (Safe Content)
            

{
  "status": "ALLOWED",
  "category": "SAFE",
  "severity": "LOW",
  "action": "ALLOW",
  "banned_days": null,
  "reason": "No policy violation detected",
  "confidence": 1.00,
  "uncertainty_flag": false,
  "escalation_required": false,
  "detected_signals": []
}
            

Custom Policies & Instructions

Add your own moderation rules and instructions per-request:

cURL - With Custom Policies

curl -X POST https://api.lumyx-ai.site/v1/moderation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "You are so stupid!",
    "policy": "#policy 1: block all insults and name-calling\n#policy 2: ban 7 days for slurs\n#policy 3: warn for mild rudeness",
    "instructions": "max ban should be 10 days, be strict with insults"
  }'

                Response (Blocked by Custom Policy)
            

{
  "status": "DISALLOWED",
  "category": "HARASSMENT",
  "severity": "LOW",
  "action": "BLOCK",
  "banned_days": null,
  "reason": "Violates custom policy to block insults",
  "confidence": 1.00,
  "uncertainty_flag": false,
  "escalation_required": false,
  "detected_signals": ["HARASSMENT"]
}
            

Using Our Built-in Policies & Instructions

Nova's default moderation model detects harmful content with built-in policies and instructions key:

cURL - Built-in Detection

curl -X POST https://api.lumyx-ai.site/v1/moderation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "I will hurt you badly",
    "instructions": "prioritize safety over freedom of speech"
  }'

                Response (Detected by Built-in Policies)
            

{
  "status": "DISALLOWED",
  "category": "VIOLENCE",
  "severity": "HIGH",
  "action": "BLOCK",
  "banned_days": 7,
  "reason": "Threatening violence against another person",
  "confidence": 0.98,
  "uncertainty_flag": false,
  "escalation_required": false,
  "auto_fail": true,
  "detected_signals": ["VIOLENCE", "THREAT"]
}
            

Python

import requests

url = "https://api.lumyx-ai.site/v1/moderation"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Example with custom policies
data = {
    "content": "Text to moderate goes here...",
    "policy": "#policy 1: block all spam\n#policy 2: be strict with harassment",
    "instructions": "prioritize safety over freedom of speech"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

if result.get("status") == "ALLOWED":
    print(f"✅ Content is safe (Confidence: {result.get('confidence')})")
else:
    print(f"❌ Content blocked!")
    print(f"Reason: {result.get('reason')}")
    print(f"Category: {result.get('category')}")
    print(f"Action: {result.get('action')}")

JavaScript / Node.js

JavaScript

const moderateContent = async () => {
  const response = await fetch('https://api.lumyx-ai.site/v1/moderation', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      content: 'Text to moderate goes here...',
      policy: '#policy 1: block all spam\n#policy 2: be strict with harassment',
      instructions: 'prioritize safety over freedom of speech'
    })
  });

  const result = await response.json();

  if (result.status === 'ALLOWED') {
    console.log(`✅ Content is safe (Confidence: ${result.confidence})`);
  } else {
    console.log(`❌ Content blocked!`);
    console.log(`Reason: ${result.reason}`);
    console.log(`Category: ${result.category}`);
    console.log(`Action: ${result.action}`);
  }
};

moderateContent();

Integration & Bots

Looking for a drop-in solution? Check out our official open-source bots built on top of the Lumyx AI Content Moderation API. These reference implementations are ready to deploy for Discord and Telegram.

GitHub Repository | Discord Bot | Telegram Bot

Full source code for Discord and Telegram bots. Includes installation guides, configuration examples, and policy templates.

Use Cases

🗨️ Chat & Messaging Apps

Screen user messages in real-time before displaying to prevent harassment, hate speech, and harmful content in chat rooms and DMs.

🤖 AI Chatbot Safety

Filter both user inputs and AI-generated responses to ensure your chatbot doesn't process or produce harmful content.

📝 User-Generated Content

Review posts, comments, reviews, and profiles before publishing to maintain community standards on forums and social platforms.

🎮 Gaming Platforms

Moderate in-game chat, usernames, clan names, and custom content to keep gaming communities safe and inclusive.

📧 Email & Support Systems

Screen incoming support tickets and emails for abusive language before routing to human agents.

📱 Dating & Social Apps

Review profile bios, messages, and user introductions to reduce harassment and inappropriate content.

🏢 Enterprise Compliance

Monitor internal communications for policy violations, harassment, or sensitive data exposure with audit logging.

🛒 E-commerce Reviews

Filter product reviews and Q&A sections for spam, fake reviews, and inappropriate content before publishing.

👶 Kids & Education Apps

Apply strict content filtering with custom child-safety policies for educational platforms and apps for minors.

📊 Analytics & Reporting

Track moderation trends, flag rates, and policy violations over time for content strategy optimization.

Error Handling

400 Bad Request invalid_request_error

Missing or invalid content parameter

401 Unauthorized authentication_error

Invalid or missing API key

503 Service Unavailable service_error

Content moderation model not available

Best Practices

Always check the uncertainty_flag and escalation_required fields. When either is true, consider routing the content for human review rather than making an automated decision.