Speech Providers

This document describes the provider resolver system in packages/agentos/src/speech/, which auto-discovers and manages speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), and wake-word providers.

Overview

SpeechProviderResolver is the central registry for speech providers. It:

Registers core providers from a static catalog at startup based on present environment variables.
Discovers extension providers from an optional ExtensionManager (priority 200, lower than core).
Resolves the best provider for a given kind, respecting streaming/local/feature requirements.
Optionally wraps multiple candidates in a fallback proxy that automatically tries the next provider when one fails.
Emits events (provider_registered, provider_fallback) for observability.

Quick Start

Set the environment variables for the providers you want, then call refresh():

import { SpeechProviderResolver } from '@agentos/agentos/speech';

const resolver = new SpeechProviderResolver();
await resolver.refresh();

const stt = resolver.resolveSTT();   // best configured STT provider
const tts = resolver.resolveTTS();   // best configured TTS provider
const vad = resolver.resolveVAD();   // always returns AgentOS Adaptive VAD
const wakeWord = resolver.resolveWakeWord(); // null if none configured

Providers are detected automatically — no explicit registration required for core providers.

Configuration via `agent.config.json`

Add a speech section to your agent configuration to control provider preference order and fallback behavior:

{
  "speech": {
    "stt": {
      "preferred": ["assemblyai", "deepgram-batch", "openai-whisper"],
      "fallback": true
    },
    "tts": {
      "preferred": ["elevenlabs", "openai-tts"],
      "fallback": true
    }
  }
}

Fields

Field	Type	Description
`stt.preferred`	`string[]`	Provider ids in priority order. Overrides the default catalog priority.
`stt.fallback`	`boolean`	When `true`, wraps multiple candidates in a `FallbackSTTProxy`.
`tts.preferred`	`string[]`	Provider ids in priority order for TTS.
`tts.fallback`	`boolean`	When `true`, wraps multiple candidates in a `FallbackTTSProxy`.

Preferred providers receive priorities 50, 51, 52, … (lower number = higher priority). All other core providers default to priority 100 and extension providers default to 200.

Provider Table

Speech-to-Text (STT)

ID	Label	Env Vars	Local	Streaming	Features
`openai-whisper`	OpenAI Whisper	`OPENAI_API_KEY`	No	No	cloud, timestamps, transcription
`deepgram-batch`	Deepgram Batch	`DEEPGRAM_API_KEY`	No	No	cloud, diarization, timestamps
`deepgram`	Deepgram	`DEEPGRAM_API_KEY`	No	Yes	cloud, streaming
`deepgram-streaming`	Deepgram Streaming	`DEEPGRAM_API_KEY`	No	Yes	streaming, interim-results, diarization, punctuation, endpointing
`assemblyai`	AssemblyAI	`ASSEMBLYAI_API_KEY`	No	Yes	cloud, streaming, diarization
`google-cloud-stt`	Google Cloud STT	`GOOGLE_STT_CREDENTIALS`	No	Yes	cloud, streaming
`azure-speech-stt`	Azure Speech STT	`AZURE_SPEECH_KEY`, `AZURE_SPEECH_REGION`	No	No	cloud, streaming
`whisper-chunked`	Whisper Chunked Streaming	`OPENAI_API_KEY`	No	Yes	streaming, interim-results
`whisper-local`	Whisper.cpp	—	Yes	No	local, offline
`vosk`	Vosk	—	Yes	Yes	local, offline, streaming
`nvidia-nemo`	NVIDIA NeMo	—	Yes	No	local, offline (unavailable — not yet integrated)

Text-to-Speech (TTS)

ID	Label	Env Vars	Local	Streaming	Features
`openai-tts`	OpenAI TTS	`OPENAI_API_KEY`	No	Yes	cloud, tts
`openai-streaming-tts`	OpenAI Streaming TTS	`OPENAI_API_KEY`	No	Yes	streaming, sentence-chunking
`elevenlabs`	ElevenLabs	`ELEVENLABS_API_KEY`	No	Yes	cloud, tts, voice-cloning
`elevenlabs-streaming-tts`	ElevenLabs Streaming TTS	`ELEVENLABS_API_KEY`	No	Yes	streaming, websocket, continuation-hints
`google-cloud-tts`	Google Cloud TTS	`GOOGLE_TTS_CREDENTIALS`	No	No	cloud, tts
`amazon-polly`	Amazon Polly	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`	No	Yes	cloud, tts
`azure-speech-tts`	Azure Speech TTS	`AZURE_SPEECH_KEY`, `AZURE_SPEECH_REGION`	No	Yes	cloud, tts
`piper`	Piper	—	Yes	No	local, offline, tts
`coqui`	Coqui XTTS	—	Yes	Yes	local, tts, voice-cloning (unavailable — not yet integrated)
`bark`	Bark	—	Yes	No	local, tts (unavailable — not yet integrated)
`styletts2`	StyleTTS2	—	Yes	No	local, tts (unavailable — not yet integrated)

VAD

ID	Label	Env Vars	Local	Features
`agentos-adaptive-vad`	AgentOS Adaptive VAD	—	Yes	local, vad, adaptive

Wake-Word

ID	Label	Env Vars	Local	Features
`porcupine`	Porcupine	`PICOVOICE_ACCESS_KEY`	Yes	local, wake-word
`openwakeword`	OpenWakeWord	—	Yes	local, wake-word

Telephony

ID	Label	Env Vars	Extension
`twilio`	Twilio	`TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`	`voice-twilio`
`telnyx`	Telnyx	`TELNYX_API_KEY`, `TELNYX_CONNECTION_ID`	`voice-telnyx`
`plivo`	Plivo	`PLIVO_AUTH_ID`, `PLIVO_AUTH_TOKEN`	`voice-plivo`

Fallback Behavior

When fallback: true is set for an STT or TTS kind, resolveSTT() / resolveTTS() returns a proxy wrapping all configured candidates sorted by priority. On each call:

The proxy invokes the first provider.
If it throws or rejects, the proxy emits a provider_fallback event on the resolver with { from, to, error }.
The next provider in the chain is tried.
If all providers fail, the original error from the first provider is re-thrown.

This lets you configure OPENAI_API_KEY and DEEPGRAM_API_KEY together with fallback: true and never worry about a single provider outage disrupting voice sessions.

resolver.on('provider_fallback', ({ from, to, error }) => {
  console.warn(`STT fallback: ${from} → ${to} (${error.message})`);
});

Resolution Requirements

Both resolveSTT() and resolveTTS() accept an optional ProviderRequirements object:

interface ProviderRequirements {
  /** Only match providers whose catalog entry declares streaming === true/false. */
  streaming?: boolean;
  /** Only match providers whose catalog entry declares local === true/false. */
  local?: boolean;
  /** Only match providers that declare all listed features. */
  features?: string[];
  /** Return only these provider ids, in this order. */
  preferredIds?: string[];
}

Example — require a streaming, cloud-based STT provider:

const stt = resolver.resolveSTT({ streaming: true, local: false });

If no configured provider matches the requirements, resolveSTT() / resolveTTS() throw with an error message describing the mismatch.

Installing Extension Providers

Extension providers ship as npm packages under the @framers/agentos-ext-* namespace and expose their provider implementation via an ExtensionPack. Install the package and pass your ExtensionManager to refresh():

npm install @framers/agentos-ext-voice-synthesis

import { ExtensionManager } from '@agentos/agentos';

const em = new ExtensionManager();
await em.loadPack('@framers/agentos-ext-voice-synthesis');

const resolver = new SpeechProviderResolver();
await resolver.refresh(em);

// ElevenLabs and any other TTS providers from the pack are now available
const tts = resolver.resolveTTS();

Extension providers default to priority 200. Add them to tts.preferred to promote them above core providers.

Adding a Custom Provider

1. Implement the interface

import type { SpeechToTextProvider, TranscribeInput, TranscribeResult } from '@agentos/agentos/speech';

export class MyCustomSTT implements SpeechToTextProvider {
  readonly id = 'my-custom-stt';
  readonly supportsStreaming = false;

  getProviderName(): string { return 'my-company'; }

  async transcribe(input: TranscribeInput): Promise<TranscribeResult> {
    // Call your API here
    return { text: '...', cost: 0 };
  }
}

For TTS implement TextToSpeechProvider; for VAD implement SpeechVadProvider; for wake-word implement WakeWordProvider.

2. Register it directly

import { findSpeechProviderCatalogEntry } from '@agentos/agentos/speech';

resolver.register({
  id: 'my-custom-stt',
  kind: 'stt',
  provider: new MyCustomSTT(),
  catalogEntry: {
    id: 'my-custom-stt',
    kind: 'stt',
    label: 'My Custom STT',
    envVars: ['MY_STT_API_KEY'],
    local: false,
    description: 'Custom STT via internal API',
    features: ['cloud'],
  },
  isConfigured: Boolean(process.env.MY_STT_API_KEY),
  priority: 50,  // set low to prefer over core providers
  source: 'core',
});

3. Or expose via an ExtensionPack

Bundle your provider inside an extension pack and publish it so others can install it via npm install:

// my-stt-extension/index.ts
export function createExtensionPack(): ExtensionPack {
  return {
    descriptors: [
      {
        id: 'my-custom-stt',
        kind: 'stt-provider',
        payload: new MyCustomSTT(),
      },
    ],
  };
}

See RFC_EXTENSION_STANDARDS.md for the full extension pack specification.

VOICE_PIPELINE.md — end-to-end voice session orchestration
RFC_EXTENSION_STANDARDS.md — extension pack authoring guide
ARCHITECTURE.md — high-level package architecture

Overview​

Quick Start​

Configuration via agent.config.json​

Fields​

Provider Table​

Speech-to-Text (STT)​

Text-to-Speech (TTS)​

VAD​

Wake-Word​

Telephony​

Fallback Behavior​

Resolution Requirements​

Installing Extension Providers​

Adding a Custom Provider​

1. Implement the interface​

2. Register it directly​

3. Or expose via an ExtensionPack​

Related Documentation​