API REFERENCE
Content Moderation AI
Analyze text content for policy violations using Nova's advanced content moderation model. Get structured safety assessments with confidence scores, severity levels, and actionable recommendations.
/v1/chat/completions endpoint. It is optimized for fast, accurate safety assessments.
Overview
The Content Moderation API evaluates text content against a comprehensive set of safety policies. It returns structured verdicts that include:
- Status - Whether content is
ALLOWEDorDISALLOWED - Category - Classification (SAFE, PROFANITY, SPAM, HARASSMENT, PII, SEXUAL, HATE_SPEECH, SELF_HARM, VIOLENCE, ILLEGAL, EXTREMISM, CHILD_SAFETY)
- Severity - Risk level (LOW, MEDIUM, HIGH, CRITICAL)
- Action - Recommended action (ALLOW, BLOCK, REDACT, ESCALATE, LOG_ONLY, BANNED)
- Confidence - Model confidence score (0.00 - 1.00)
- Detected Signals - Specific policy triggers found
Request Format
Headers
Bearer YOUR_API_KEYapplication/jsonBody Parameters
#policy 1: description. Maximum 10,000 characters.
Response Format
The API returns a JSON object with the moderation assessment:
| Field | Type | Description |
|---|---|---|
status |
string | Overall verdict: ALLOWED or DISALLOWED |
category |
string | Content category: SAFE, PROFANITY, SPAM, HARASSMENT, PII, SEXUAL, HATE_SPEECH, SELF_HARM, VIOLENCE, ILLEGAL, EXTREMISM, CHILD_SAFETY |
severity |
string | Risk level: LOW, MEDIUM, HIGH, CRITICAL |
action |
string | Recommended action: ALLOW, BLOCK, REDACT, ESCALATE, LOG_ONLY, BANNED |
banned_days |
integer|null | Suggested ban duration if applicable (e.g., 365 for zero-tolerance) |
reason |
string | Human-readable explanation of the decision |
confidence |
float | Model confidence score (0.00 - 1.00) |
uncertainty_flag |
boolean | True if the model is uncertain about the assessment |
ambiguity_reason |
string | Reason for uncertainty: NONE, CONTEXT_MISSING, SATIRE, QUOTED_CONTENT, MIXED_SIGNALS |
escalation_required |
boolean | True if human review is strictly recommended |
auto_fail |
boolean | True if content triggered zero-tolerance policy |
detected_signals |
array | List of specific policy triggers found (e.g. THREAT, EXTREMISM) |
policy_version |
string | Version of the policy used for assessment |
timestamp |
string | ISO 8601 timestamp of the assessment |
usage |
object | Token usage: prompt_tokens, completion_tokens, total_tokens |
Escalation Rules
The model applies strict rules to determine severity and actions. Understanding these rules helps interpret the results.
- Child Safety: Any sexual exploitation of minors triggers
CRITICALseverity,BANNEDaction, and 365-day ban. - Terrorism / Extremism: Credible threats or detailed violence planning trigger
CRITICALseverity and immediate ban.
If 3 or more minor risk signals (e.g., mild threats, spam, profanity) appear together:
- Severity escalates by one level (e.g., LOW → MEDIUM).
- If severity reaches CRITICAL, the action defaults to BANNED.
- Zero-Tolerance: Confidence is always
1.00. - Ambiguity: If conflicting signals exist,
uncertainty_flagis set to true and confidence is lowered.
Examples
Basic Request (Content Only)
Simple moderation using the model's default policies:
curl -X POST https://api.lumyx-ai.site/v1/moderation \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Hello, how are you doing today?"
}'
{
"status": "ALLOWED",
"category": "SAFE",
"severity": "LOW",
"action": "ALLOW",
"banned_days": null,
"reason": "No policy violation detected",
"confidence": 1.00,
"uncertainty_flag": false,
"escalation_required": false,
"detected_signals": []
}
Custom Policies & Instructions
Add your own moderation rules and instructions per-request:
curl -X POST https://api.lumyx-ai.site/v1/moderation \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "You are so stupid!",
"policy": "#policy 1: block all insults and name-calling\n#policy 2: ban 7 days for slurs\n#policy 3: warn for mild rudeness",
"instructions": "max ban should be 10 days, be strict with insults"
}'
{
"status": "DISALLOWED",
"category": "HARASSMENT",
"severity": "LOW",
"action": "BLOCK",
"banned_days": null,
"reason": "Violates custom policy to block insults",
"confidence": 1.00,
"uncertainty_flag": false,
"escalation_required": false,
"detected_signals": ["HARASSMENT"]
}
Using Our Built-in Policies & Instructions
Nova's default moderation model detects harmful content with built-in policies and instructions key:
curl -X POST https://api.lumyx-ai.site/v1/moderation \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "I will hurt you badly",
"instructions": "prioritize safety over freedom of speech"
}'
{
"status": "DISALLOWED",
"category": "VIOLENCE",
"severity": "HIGH",
"action": "BLOCK",
"banned_days": 7,
"reason": "Threatening violence against another person",
"confidence": 0.98,
"uncertainty_flag": false,
"escalation_required": false,
"auto_fail": true,
"detected_signals": ["VIOLENCE", "THREAT"]
}
Python
import requests
url = "https://api.lumyx-ai.site/v1/moderation"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
# Example with custom policies
data = {
"content": "Text to moderate goes here...",
"policy": "#policy 1: block all spam\n#policy 2: be strict with harassment",
"instructions": "prioritize safety over freedom of speech"
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
if result.get("status") == "ALLOWED":
print(f"✅ Content is safe (Confidence: {result.get('confidence')})")
else:
print(f"❌ Content blocked!")
print(f"Reason: {result.get('reason')}")
print(f"Category: {result.get('category')}")
print(f"Action: {result.get('action')}")
JavaScript / Node.js
const moderateContent = async () => {
const response = await fetch('https://api.lumyx-ai.site/v1/moderation', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: 'Text to moderate goes here...',
policy: '#policy 1: block all spam\n#policy 2: be strict with harassment',
instructions: 'prioritize safety over freedom of speech'
})
});
const result = await response.json();
if (result.status === 'ALLOWED') {
console.log(`✅ Content is safe (Confidence: ${result.confidence})`);
} else {
console.log(`❌ Content blocked!`);
console.log(`Reason: ${result.reason}`);
console.log(`Category: ${result.category}`);
console.log(`Action: ${result.action}`);
}
};
moderateContent();
Integration & Bots
Looking for a drop-in solution? Check out our official open-source bots built on top of the Lumyx AI Content Moderation API. These reference implementations are ready to deploy for Discord and Telegram.
Use Cases
Error Handling
content parameteruncertainty_flag and escalation_required fields.
When either is true, consider routing the content for human review rather than making an automated decision.