Skip to main content

Class: AzureSpeechSTTProvider

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:156

Speech-to-text provider that uses the Azure Cognitive Services Speech REST API.

Azure REST Endpoint Format

The endpoint URL follows this pattern:

https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language={lang}
  • {region} — The Azure region from config (e.g. eastus, westeurope).
  • {lang} — BCP-47 language code from options or 'en-US' default.
  • The /conversation/ path segment selects the conversation recognition mode (as opposed to /interactive/ or /dictation/).

Authentication: Ocp-Apim-Subscription-Key

Azure Cognitive Services uses the Ocp-Apim-Subscription-Key HTTP header for authentication, which differs from the typical Authorization: Bearer pattern. The subscription key is sent as a plain-text header value — no "Bearer" or "Token" prefix.

An alternative is to use a short-lived token from the token endpoint, but this provider uses the simpler key-based approach for reliability.

NoMatch Handling

When Azure's recognizer detects audio but cannot identify any speech, it returns RecognitionStatus: 'NoMatch' instead of raising an HTTP error. This provider maps NoMatch to an empty-text result (text: '') with isFinal: true, matching the Azure Speech SDK's behaviour. This prevents the fallback proxy from unnecessarily trying another provider when the audio genuinely contains no speech.

Limitations

  • Audio must be PCM WAV format. The Content-Type is hardcoded to audio/wav regardless of the audio.mimeType value.
  • Streaming is not supported — use the Azure Speech SDK for real-time STT.
  • Speaker diarization is not available via the REST API.

See

Example

const provider = new AzureSpeechSTTProvider({
key: process.env.AZURE_SPEECH_KEY!,
region: 'eastus',
});
const result = await provider.transcribe(
{ data: wavBuffer, mimeType: 'audio/wav' },
{ language: 'de-DE' },
);
console.log(result.text); // '' if no speech detected

Implements

Constructors

Constructor

new AzureSpeechSTTProvider(config): AzureSpeechSTTProvider

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:182

Creates a new AzureSpeechSTTProvider.

Parameters

config

AzureSpeechSTTProviderConfig

Provider configuration including the subscription key and region.

Returns

AzureSpeechSTTProvider

Example

const provider = new AzureSpeechSTTProvider({
key: 'your-azure-subscription-key',
region: 'eastus',
});

Properties

displayName

readonly displayName: "Azure Speech (STT)" = 'Azure Speech (STT)'

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:161

Human-readable display name for UI and logging.

Implementation of

SpeechToTextProvider.displayName


id

readonly id: "azure-speech-stt" = 'azure-speech-stt'

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:158

Unique provider identifier used for registration and resolution.

Implementation of

SpeechToTextProvider.id


supportsStreaming

readonly supportsStreaming: false = false

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:164

This provider uses synchronous HTTP requests, not WebSocket streaming.

Implementation of

SpeechToTextProvider.supportsStreaming

Methods

getProviderName()

getProviderName(): string

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:196

Returns the human-readable provider name.

Returns

string

The display name string 'Azure Speech (STT)'.

Example

provider.getProviderName(); // 'Azure Speech (STT)'

Implementation of

SpeechToTextProvider.getProviderName


transcribe()

transcribe(audio, options?): Promise<SpeechTranscriptionResult>

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:226

Transcribes an audio buffer using the Azure Speech recognition REST endpoint.

Sends the raw audio as PCM WAV and returns a normalized result. Azure's NoMatch status is treated as an empty transcript (not an error).

Parameters

audio

SpeechAudioInput

Raw audio data. Azure expects PCM WAV format; the Content-Type header is always set to 'audio/wav' regardless of audio.mimeType.

options?

SpeechTranscriptionOptions = {}

Optional transcription settings. Only language is supported by the Azure REST endpoint.

Returns

Promise<SpeechTranscriptionResult>

A promise resolving to the normalized transcription result.

Throws

When the Azure API returns a non-2xx HTTP status code. The error message includes the status and response body text.

Example

const result = await provider.transcribe(
{ data: wavBuffer, durationSeconds: 5 },
{ language: 'fr-FR' },
);
if (result.text === '') {
console.log('No speech detected in the audio');
}

Implementation of

SpeechToTextProvider.transcribe