Skip to main content

Audio Generation

AgentOS provides provider-agnostic APIs for generating music and sound effects from text prompts. Two high-level functions cover the full audio generation pipeline:

FunctionPurpose
generateMusic()Full-length musical compositions from text prompts
generateSFX()Short sound effects from text descriptions

Both APIs support automatic provider detection, fallback chains via FallbackAudioProxy, progress callbacks, and per-call provider preference overrides.

Providers

Music providers

ProviderEnv VarIDNotes
SunoSUNO_API_KEYsunoUp to ~240s, highest quality
UdioUDIO_API_KEYudioCloud music generation
Stable AudioSTABILITY_API_KEYstable-audioUp to ~47s
ReplicateREPLICATE_API_TOKENreplicate-audioVarious music models
FalFAL_API_KEYfal-audioVarious music models
MusicGen Local(none)musicgen-localLocal via HuggingFace Transformers.js

SFX providers

ProviderEnv VarIDNotes
ElevenLabsELEVENLABS_API_KEYelevenlabs-sfxHighest quality SFX
Stable AudioSTABILITY_API_KEYstable-audioAlso supports SFX
ReplicateREPLICATE_API_TOKENreplicate-audioVarious SFX models
FalFAL_API_KEYfal-audioVarious SFX models
AudioGen Local(none)audiogen-localLocal via HuggingFace Transformers.js

Provider resolution follows priority order (top of table = highest priority). When multiple providers are configured, a FallbackAudioProxy wraps the chain for automatic failover.

generateMusic()

Generate a musical composition from a text prompt.

import { generateMusic } from '@framers/agentos';

const result = await generateMusic({
prompt: 'Upbeat lo-fi hip hop beat with vinyl crackle and mellow piano',
durationSec: 60,
});
console.log(result.audio[0].url);
console.log(`Provider: ${result.provider}, Model: ${result.model}`);

With provider preferences

const result = await generateMusic({
prompt: 'Ambient electronic soundscape with reverb pads',
provider: 'stable-audio',
model: 'stable-audio-open-1.0',
durationSec: 30,
outputFormat: 'wav',
onProgress: (event) => {
console.log(`[${event.status}] ${event.progress ?? '?'}% - ${event.message}`);
},
});

GenerateMusicOptions

OptionTypeDescription
promptstringText prompt describing the desired composition (required)
providerstringProvider ID ("suno", "udio", "stable-audio", etc.)
modelstringModel override within the provider
durationSecnumberDesired output duration in seconds
negativePromptstringMusical elements to avoid
outputFormatAudioOutputFormat"mp3" / "wav" / "flac" / "ogg" / "aac"
seednumberSeed for reproducible generation
timeoutMsnumberMaximum wait time in milliseconds
nnumberNumber of clips to generate (default: 1)
onProgress(event) => voidProgress callback with AudioProgressEvent
providerPreferencesMediaProviderPreferenceReorder or filter the fallback chain
apiKeystringOverride the API key

generateSFX()

Generate a short sound effect from a text description.

import { generateSFX } from '@framers/agentos';

const result = await generateSFX({
prompt: 'Thunder crack followed by heavy rain on a tin roof',
durationSec: 5,
});
console.log(result.audio[0].url);

GenerateSFXOptions

OptionTypeDescription
promptstringText prompt describing the desired sound effect (required)
providerstringProvider ID ("elevenlabs-sfx", "stable-audio", etc.)
modelstringModel override within the provider
durationSecnumberDesired output duration (SFX: typically 1-15s)
outputFormatAudioOutputFormat"mp3" / "wav" / "flac" / "ogg" / "aac"
seednumberSeed for reproducible generation
timeoutMsnumberMaximum wait time in milliseconds
nnumberNumber of clips to generate (default: 1)
onProgress(event) => voidProgress callback with AudioProgressEvent
providerPreferencesMediaProviderPreferenceReorder or filter the fallback chain
apiKeystringOverride the API key

Result types

Both generateMusic() and generateSFX() return a similar result envelope:

interface GenerateMusicResult {
model: string; // e.g. "suno-v3.5"
provider: string; // e.g. "suno"
created: number; // Unix timestamp (seconds)
audio: GeneratedAudio[];
usage?: AudioProviderUsage;
}

Each GeneratedAudio object contains:

interface GeneratedAudio {
url?: string; // Public download URL
base64?: string; // Base64-encoded audio data
mimeType?: string; // e.g. "audio/mpeg"
durationSec?: number; // Clip duration
sampleRate?: number; // e.g. 44100
}

AudioProgressEvent

interface AudioProgressEvent {
status: 'queued' | 'processing' | 'downloading' | 'complete' | 'failed';
progress?: number; // 0-100
estimatedRemainingMs?: number;
message?: string;
}

Synchronous providers (Stable Audio, ElevenLabs) may jump directly from processing to complete.

Local generation

Both musicgen-local and audiogen-local providers run entirely on the local machine via HuggingFace Transformers.js. No API key is required. They serve as the lowest-priority fallback in the provider chain, ensuring audio generation is always available even without cloud credentials.

Observability

All audio API calls emit OpenTelemetry spans (agentos.api.generate_music, agentos.api.generate_sfx) and record usage metrics to the durable usage ledger when configured.