Skip to main content

Content Policy Rewriter

Opt-in content policy guardrail for AgentOS — detects violations in agent output and either blocks or rewrites them to compliant versions via LLM judge.

Agents are uncensored by default. This extension only activates when explicitly configured.

Features

  • Two-layer hybrid pipeline: keyword pre-filter on streaming chunks (zero-cost) + LLM judge/rewriter on final response
  • 8 configurable categories: illegal_harmful, adult, profanity, violence, self_harm, hate_speech, illegal_activity, custom
  • 4 presets: uncensored, uncensored-safe, family-friendly, enterprise
  • Fully configurable: every category can be enabled/disabled, action set to block or sanitize
  • No hardcoded restrictions: all policies are user-controlled

Quick Start

import { createContentPolicyRewriter } from '@framers/agentos-ext-content-policy-rewriter';

// Minimal — blocks illegal_harmful content only (default)
const pack = createContentPolicyRewriter({});

// Family-friendly preset
const pack = createContentPolicyRewriter('family-friendly');

// Custom configuration
const pack = createContentPolicyRewriter({
categories: {
adult: { enabled: true, action: 'sanitize' },
profanity: { enabled: true, action: 'sanitize' },
violence: { enabled: true, action: 'block' },
},
customRules: 'Never mention competitor products by name.',
});

// Truly uncensored — zero filtering
const pack = createContentPolicyRewriter('uncensored');

agent.config.json

{
"guardrails": {
"contentPolicy": {
"enabled": true,
"categories": {
"illegal_harmful": { "enabled": true, "action": "block" },
"adult": { "enabled": true, "action": "sanitize" },
"profanity": { "enabled": true, "action": "sanitize" }
}
}
}
}

Or shorthand:

{
"guardrails": {
"contentPolicy": "uncensored-safe"
}
}

Categories

CategoryDescriptionDefault
illegal_harmfulCSAM, sexual assault, bestiality, exploitationenabled, block
adultConsensual sexually explicit contentdisabled
profanitySlurs, vulgar languagedisabled
violenceGraphic violence, goredisabled
self_harmSelf-harm, suicide instructionsdisabled
hate_speechDiscriminatory, bigoted contentdisabled
illegal_activityDrug synthesis, weapons manufacturingdisabled
customUser-defined policy rulesdisabled

Presets

PresetEffect
uncensoredAll categories disabled — zero filtering
uncensored-safeOnly illegal_harmful enabled
family-friendlyAll categories enabled (sanitize where possible)
enterpriseAll categories enabled + custom rules

License

MIT