Skip to main content

Speech Providers

This document describes the provider resolver system in packages/agentos/src/speech/, which auto-discovers and manages speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), and wake-word providers.


Overview

SpeechProviderResolver is the central registry for speech providers. It:

  • Registers core providers from a static catalog at startup based on present environment variables.
  • Discovers extension providers from an optional ExtensionManager (priority 200, lower than core).
  • Resolves the best provider for a given kind, respecting streaming/local/feature requirements.
  • Optionally wraps multiple candidates in a fallback proxy that automatically tries the next provider when one fails.
  • Emits events (provider_registered, provider_fallback) for observability.

Quick Start

Set the environment variables for the providers you want, then call refresh():

import { SpeechProviderResolver } from '@agentos/agentos/speech';

const resolver = new SpeechProviderResolver();
await resolver.refresh();

const stt = resolver.resolveSTT(); // best configured STT provider
const tts = resolver.resolveTTS(); // best configured TTS provider
const vad = resolver.resolveVAD(); // always returns AgentOS Adaptive VAD
const wakeWord = resolver.resolveWakeWord(); // null if none configured

Providers are detected automatically — no explicit registration required for core providers.


Configuration via agent.config.json

Add a speech section to your agent configuration to control provider preference order and fallback behavior:

{
"speech": {
"stt": {
"preferred": ["assemblyai", "deepgram-batch", "openai-whisper"],
"fallback": true
},
"tts": {
"preferred": ["elevenlabs", "openai-tts"],
"fallback": true
}
}
}

Fields

FieldTypeDescription
stt.preferredstring[]Provider ids in priority order. Overrides the default catalog priority.
stt.fallbackbooleanWhen true, wraps multiple candidates in a FallbackSTTProxy.
tts.preferredstring[]Provider ids in priority order for TTS.
tts.fallbackbooleanWhen true, wraps multiple candidates in a FallbackTTSProxy.

Preferred providers receive priorities 50, 51, 52, … (lower number = higher priority). All other core providers default to priority 100 and extension providers default to 200.


Provider Table

Speech-to-Text (STT)

IDLabelEnv VarsLocalStreamingFeatures
openai-whisperOpenAI WhisperOPENAI_API_KEYNoNocloud, timestamps, transcription
deepgram-batchDeepgram BatchDEEPGRAM_API_KEYNoNocloud, diarization, timestamps
deepgramDeepgramDEEPGRAM_API_KEYNoYescloud, streaming
deepgram-streamingDeepgram StreamingDEEPGRAM_API_KEYNoYesstreaming, interim-results, diarization, punctuation, endpointing
assemblyaiAssemblyAIASSEMBLYAI_API_KEYNoYescloud, streaming, diarization
google-cloud-sttGoogle Cloud STTGOOGLE_STT_CREDENTIALSNoYescloud, streaming
azure-speech-sttAzure Speech STTAZURE_SPEECH_KEY, AZURE_SPEECH_REGIONNoNocloud, streaming
whisper-chunkedWhisper Chunked StreamingOPENAI_API_KEYNoYesstreaming, interim-results
whisper-localWhisper.cppYesNolocal, offline
voskVoskYesYeslocal, offline, streaming
nvidia-nemoNVIDIA NeMoYesNolocal, offline (unavailable — not yet integrated)

Text-to-Speech (TTS)

IDLabelEnv VarsLocalStreamingFeatures
openai-ttsOpenAI TTSOPENAI_API_KEYNoYescloud, tts
openai-streaming-ttsOpenAI Streaming TTSOPENAI_API_KEYNoYesstreaming, sentence-chunking
elevenlabsElevenLabsELEVENLABS_API_KEYNoYescloud, tts, voice-cloning
elevenlabs-streaming-ttsElevenLabs Streaming TTSELEVENLABS_API_KEYNoYesstreaming, websocket, continuation-hints
google-cloud-ttsGoogle Cloud TTSGOOGLE_TTS_CREDENTIALSNoNocloud, tts
amazon-pollyAmazon PollyAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEYNoYescloud, tts
azure-speech-ttsAzure Speech TTSAZURE_SPEECH_KEY, AZURE_SPEECH_REGIONNoYescloud, tts
piperPiperYesNolocal, offline, tts
coquiCoqui XTTSYesYeslocal, tts, voice-cloning (unavailable — not yet integrated)
barkBarkYesNolocal, tts (unavailable — not yet integrated)
styletts2StyleTTS2YesNolocal, tts (unavailable — not yet integrated)

VAD

IDLabelEnv VarsLocalFeatures
agentos-adaptive-vadAgentOS Adaptive VADYeslocal, vad, adaptive

Wake-Word

IDLabelEnv VarsLocalFeatures
porcupinePorcupinePICOVOICE_ACCESS_KEYYeslocal, wake-word
openwakewordOpenWakeWordYeslocal, wake-word

Telephony

IDLabelEnv VarsExtension
twilioTwilioTWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKENvoice-twilio
telnyxTelnyxTELNYX_API_KEY, TELNYX_CONNECTION_IDvoice-telnyx
plivoPlivoPLIVO_AUTH_ID, PLIVO_AUTH_TOKENvoice-plivo

Fallback Behavior

When fallback: true is set for an STT or TTS kind, resolveSTT() / resolveTTS() returns a proxy wrapping all configured candidates sorted by priority. On each call:

  1. The proxy invokes the first provider.
  2. If it throws or rejects, the proxy emits a provider_fallback event on the resolver with { from, to, error }.
  3. The next provider in the chain is tried.
  4. If all providers fail, the original error from the first provider is re-thrown.

This lets you configure OPENAI_API_KEY and DEEPGRAM_API_KEY together with fallback: true and never worry about a single provider outage disrupting voice sessions.

resolver.on('provider_fallback', ({ from, to, error }) => {
console.warn(`STT fallback: ${from}${to} (${error.message})`);
});

Resolution Requirements

Both resolveSTT() and resolveTTS() accept an optional ProviderRequirements object:

interface ProviderRequirements {
/** Only match providers whose catalog entry declares streaming === true/false. */
streaming?: boolean;
/** Only match providers whose catalog entry declares local === true/false. */
local?: boolean;
/** Only match providers that declare all listed features. */
features?: string[];
/** Return only these provider ids, in this order. */
preferredIds?: string[];
}

Example — require a streaming, cloud-based STT provider:

const stt = resolver.resolveSTT({ streaming: true, local: false });

If no configured provider matches the requirements, resolveSTT() / resolveTTS() throw with an error message describing the mismatch.


Installing Extension Providers

Extension providers ship as npm packages under the @framers/agentos-ext-* namespace and expose their provider implementation via an ExtensionPack. Install the package and pass your ExtensionManager to refresh():

npm install @framers/agentos-ext-voice-synthesis
import { ExtensionManager } from '@agentos/agentos';

const em = new ExtensionManager();
await em.loadPack('@framers/agentos-ext-voice-synthesis');

const resolver = new SpeechProviderResolver();
await resolver.refresh(em);

// ElevenLabs and any other TTS providers from the pack are now available
const tts = resolver.resolveTTS();

Extension providers default to priority 200. Add them to tts.preferred to promote them above core providers.


Adding a Custom Provider

1. Implement the interface

import type { SpeechToTextProvider, TranscribeInput, TranscribeResult } from '@agentos/agentos/speech';

export class MyCustomSTT implements SpeechToTextProvider {
readonly id = 'my-custom-stt';
readonly supportsStreaming = false;

getProviderName(): string { return 'my-company'; }

async transcribe(input: TranscribeInput): Promise<TranscribeResult> {
// Call your API here
return { text: '...', cost: 0 };
}
}

For TTS implement TextToSpeechProvider; for VAD implement SpeechVadProvider; for wake-word implement WakeWordProvider.

2. Register it directly

import { findSpeechProviderCatalogEntry } from '@agentos/agentos/speech';

resolver.register({
id: 'my-custom-stt',
kind: 'stt',
provider: new MyCustomSTT(),
catalogEntry: {
id: 'my-custom-stt',
kind: 'stt',
label: 'My Custom STT',
envVars: ['MY_STT_API_KEY'],
local: false,
description: 'Custom STT via internal API',
features: ['cloud'],
},
isConfigured: Boolean(process.env.MY_STT_API_KEY),
priority: 50, // set low to prefer over core providers
source: 'core',
});

3. Or expose via an ExtensionPack

Bundle your provider inside an extension pack and publish it so others can install it via npm install:

// my-stt-extension/index.ts
export function createExtensionPack(): ExtensionPack {
return {
descriptors: [
{
id: 'my-custom-stt',
kind: 'stt-provider',
payload: new MyCustomSTT(),
},
],
};
}

See RFC_EXTENSION_STANDARDS.md for the full extension pack specification.