Image Generation
Generate images from text prompts across 5 providers with a single unified API.
Image Editing (Img2Img, Inpainting, Upscaling)
Edit, upscale, and create variations of existing images across multiple providers with a unified API.
Character Consistency
Generate images that maintain a consistent character identity across multiple outputs using reference images and face embeddings.
Style Transfer
Apply the visual style of one image to another using transferStyle(), backed by Flux Redux and cross-provider img2img.
Vision Pipeline (OCR & Image Understanding)
A 3-tier progressive enhancement pipeline for extracting text, understanding images, and generating visual embeddings.
Audio Generation
AgentOS provides provider-agnostic APIs for generating music and sound effects from text prompts. Two high-level functions cover the full audio generation pipeline:
Provider Preferences
The ProviderPreferences system gives callers fine-grained control over which media providers are used and in what order. It applies to image generation, video generation, and audio generation (music and SFX separately).
Video Pipeline
AgentOS provides a provider-agnostic video pipeline covering generation (text-to-video, image-to-video), analysis (scene detection, transcription, summarisation), and RAG-ready indexing. Three high-level API functions expose the full pipeline: