Character Consistency
Generate images that maintain a consistent character identity across multiple outputs using reference images and face embeddings.
Overview
Character consistency lets you anchor generated images to a reference face or character, ensuring the same person appears across portraits, expressions, full-body shots, and scene illustrations. AgentOS supports three levels of consistency via the consistencyMode parameter:
| Mode | Strength | Use Case |
|---|---|---|
'strict' | 0.85–0.9 | Avatar expression sheets, emotion variants. Face must match exactly. |
'balanced' | 0.6 | Full-body shots, different angles. Recognizable but allows natural variation. |
'loose' | 0.3 | "Inspired by" generations. Style/mood carries over, face may drift. |
Provider Support
| Provider | Mechanism | Models |
|---|---|---|
| Replicate | Pulid (strict), Flux image input (balanced/loose) | zsxkib/pulid, black-forest-labs/flux-dev |
| Fal | IP-Adapter | fal-ai/flux/dev |
| SD-Local | ControlNet + IP-Adapter extension | Any SD 1.5 / SDXL checkpoint |
| OpenAI | Not supported (graceful ignore) | — |
| Stability | Not supported (graceful ignore) | — |
Basic Usage
import { generateImage } from '@framers/agentos';
// Generate a consistent expression variant
const result = await generateImage({
provider: 'replicate',
prompt: 'Portrait of the character smiling warmly, soft lighting',
referenceImageUrl: 'https://storage.example.com/character-neutral.png',
consistencyMode: 'strict',
});
When consistencyMode is 'strict' and no model is explicitly set, Replicate auto-selects zsxkib/pulid for maximum face consistency.
Fields Reference
referenceImageUrl
URL or base64 data URI of the reference character image. Each provider maps this to its native mechanism:
- Replicate (Pulid):
main_face_imageinput - Replicate (standard Flux):
imageinput withimage_strength - Fal:
ip_adapter_imagebody field - SD-Local: ControlNet
input_imagewith IP-Adapter preprocessor
faceEmbedding
Optional 512-dimensional vector from InsightFace or equivalent. Used by the AvatarPipeline for drift detection — after generating each image, the pipeline extracts the face embedding from the output and compares it to this anchor via cosine similarity. Images that drift below the threshold (default 0.6) are regenerated.
consistencyMode
Controls how aggressively the provider preserves the reference identity:
// Strict — for expression sheets where faces must match
await generateImage({
prompt: 'Character looking angry, dramatic lighting',
referenceImageUrl: neutralPortrait,
consistencyMode: 'strict', // Pulid auto-selected on Replicate
});
// Balanced — for full-body shots
await generateImage({
prompt: 'Full body shot of the character walking through a market',
referenceImageUrl: neutralPortrait,
consistencyMode: 'balanced',
});
// Loose — for "inspired by" mood pieces
await generateImage({
prompt: 'Abstract portrait in the style of the character',
referenceImageUrl: neutralPortrait,
consistencyMode: 'loose',
});
AvatarPipeline Integration
The AvatarPipeline uses consistency modes per stage:
| Stage | Mode | Rationale |
|---|---|---|
neutral_portrait | none | This IS the anchor — no reference exists yet |
face_embedding | none | Extraction, not generation |
expression_sheet | 'strict' | Facial identity must match across all emotions |
animated_emotes | 'strict' | Same character in motion |
full_body | 'balanced' | Body proportions can vary; face should be recognizable |
additional_angles | 'balanced' | 3/4 and profile views naturally differ from frontal |
import { AvatarPipeline } from '@framers/agentos/media/avatar';
const pipeline = new AvatarPipeline(faceService, imageGenerator);
const result = await pipeline.generate({
characterId: 'hero_001',
identity: {
displayName: 'Kael Stormwind',
ageBand: 'young_adult',
faceDescriptor: 'sharp jawline, green eyes, short dark hair, small scar above left eyebrow',
},
generationConfig: {
baseModel: 'black-forest-labs/flux-dev',
provider: 'replicate',
},
stages: ['neutral_portrait', 'face_embedding', 'expression_sheet', 'full_body'],
});
Choosing the Right Mode
- Avatars and expression sheets: Always
'strict'. The face is the product. - Scene illustrations with known characters:
'balanced'. Character should be recognizable but the scene composition matters more. - Style exploration and mood boards:
'loose'. The reference influences the vibe, not the pixels. - No reference at all: Omit
referenceImageUrlentirely. The fields are fully optional.
Related
- Image Generation — Provider-agnostic generation API
- Style Transfer — Transfer visual aesthetics between images
- Image Editing — Img2img, inpainting, upscaling