Skip to main content

Class: SpeechProviderAdapter

Defined in: packages/agentos/src/rag/multimodal/SpeechProviderAdapter.ts:77

Bridges the voice-pipeline's SpeechToTextProvider to the multimodal indexer's ISpeechToTextProvider interface.

Converts raw Buffer audio into the SpeechAudioInput shape expected by voice providers, forwards the language hint through SpeechTranscriptionOptions, and extracts the plain transcript text from the rich SpeechTranscriptionResult.

Example

const whisper = resolver.resolveSTT();
const adapted = new SpeechProviderAdapter(whisper);

// Now usable by the multimodal indexer:
const text = await adapted.transcribe(audioBuffer, 'en');

Implements

Constructors

Constructor

new SpeechProviderAdapter(provider, defaultMimeType?): SpeechProviderAdapter

Defined in: packages/agentos/src/rag/multimodal/SpeechProviderAdapter.ts:109

Create a new adapter wrapping a voice-pipeline STT provider.

Parameters

provider

SpeechToTextProvider

A configured SpeechToTextProvider instance (e.g. Whisper, Deepgram, AssemblyAI, Azure Speech).

defaultMimeType?

string = 'audio/wav'

MIME type to assume for raw audio buffers. Defaults to 'audio/wav' which is accepted by all major STT providers. Override to 'audio/mpeg' or 'audio/ogg' when indexing MP3/OGG files.

Returns

SpeechProviderAdapter

Throws

If provider is null or undefined.

Example

const adapter = new SpeechProviderAdapter(whisperProvider);
const mp3Adapter = new SpeechProviderAdapter(whisperProvider, 'audio/mpeg');

Methods

getProviderName()

getProviderName(): string

Defined in: packages/agentos/src/rag/multimodal/SpeechProviderAdapter.ts:164

Get the display name of the underlying STT provider.

Useful for logging and diagnostics — lets callers identify which voice-pipeline provider is actually handling transcription.

Returns

string

The provider's display name or ID string.

Example

console.log(`STT via: ${adapter.getProviderName()}`); // "openai-whisper"

transcribe()

transcribe(audio, language?): Promise<string>

Defined in: packages/agentos/src/rag/multimodal/SpeechProviderAdapter.ts:140

Transcribe audio data to text.

Wraps the raw buffer in a SpeechAudioInput and delegates to the underlying voice-pipeline provider. The rich transcription result is reduced to the plain text string that the multimodal indexer needs for embedding generation.

Parameters

audio

Buffer

Raw audio data as a Buffer (WAV, MP3, OGG, etc.).

language?

string

Optional BCP-47 language code hint for improved transcription accuracy (e.g. 'en', 'es', 'ja').

Returns

Promise<string>

The transcribed text content.

Throws

If the underlying STT provider fails.

Example

const transcript = await adapter.transcribe(wavBuffer);
const spanishTranscript = await adapter.transcribe(audioBuffer, 'es');

Implementation of

ISpeechToTextProvider.transcribe