Class: AzureSpeechSTTProvider

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:156

Speech-to-text provider that uses the Azure Cognitive Services Speech REST API.

Azure REST Endpoint Format

The endpoint URL follows this pattern:

https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language={lang}

{region} — The Azure region from config (e.g. eastus, westeurope).
{lang} — BCP-47 language code from options or 'en-US' default.
The /conversation/ path segment selects the conversation recognition mode (as opposed to /interactive/ or /dictation/).

Authentication: `Ocp-Apim-Subscription-Key`

Azure Cognitive Services uses the Ocp-Apim-Subscription-Key HTTP header for authentication, which differs from the typical Authorization: Bearer pattern. The subscription key is sent as a plain-text header value — no "Bearer" or "Token" prefix.

An alternative is to use a short-lived token from the token endpoint, but this provider uses the simpler key-based approach for reliability.

NoMatch Handling

When Azure's recognizer detects audio but cannot identify any speech, it returns RecognitionStatus: 'NoMatch' instead of raising an HTTP error. This provider maps NoMatch to an empty-text result (text: '') with isFinal: true, matching the Azure Speech SDK's behaviour. This prevents the fallback proxy from unnecessarily trying another provider when the audio genuinely contains no speech.

Limitations

Audio must be PCM WAV format. The Content-Type is hardcoded to audio/wav regardless of the audio.mimeType value.
Streaming is not supported — use the Azure Speech SDK for real-time STT.
Speaker diarization is not available via the REST API.

See

AzureSpeechSTTProviderConfig for configuration options
AzureSpeechTTSProvider for the corresponding TTS provider

Example

const provider = new AzureSpeechSTTProvider({
  key: process.env.AZURE_SPEECH_KEY!,
  region: 'eastus',
});
const result = await provider.transcribe(
  { data: wavBuffer, mimeType: 'audio/wav' },
  { language: 'de-DE' },
);
console.log(result.text); // '' if no speech detected

Implements

SpeechToTextProvider

Constructors

Constructor

new AzureSpeechSTTProvider(config): AzureSpeechSTTProvider

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:182

Creates a new AzureSpeechSTTProvider.

Parameters

config

AzureSpeechSTTProviderConfig

Provider configuration including the subscription key and region.

Returns

AzureSpeechSTTProvider

Example

const provider = new AzureSpeechSTTProvider({
  key: 'your-azure-subscription-key',
  region: 'eastus',
});

Properties

displayName

readonly displayName: "Azure Speech (STT)" = 'Azure Speech (STT)'

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:161

Human-readable display name for UI and logging.

Implementation of

SpeechToTextProvider.displayName

id

readonly id: "azure-speech-stt" = 'azure-speech-stt'

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:158

Unique provider identifier used for registration and resolution.

Implementation of

SpeechToTextProvider.id

supportsStreaming

readonly supportsStreaming: false = false

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:164

This provider uses synchronous HTTP requests, not WebSocket streaming.

Implementation of

SpeechToTextProvider.supportsStreaming

Methods

getProviderName()

getProviderName(): string

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:196

Returns the human-readable provider name.

Returns

string

The display name string 'Azure Speech (STT)'.

Example

provider.getProviderName(); // 'Azure Speech (STT)'

Implementation of

SpeechToTextProvider.getProviderName

transcribe()

transcribe(audio, options?): Promise<SpeechTranscriptionResult>

Defined in: packages/agentos/src/hearing/providers/AzureSpeechSTTProvider.ts:226

Transcribes an audio buffer using the Azure Speech recognition REST endpoint.

Sends the raw audio as PCM WAV and returns a normalized result. Azure's NoMatch status is treated as an empty transcript (not an error).

Parameters

audio

SpeechAudioInput

Raw audio data. Azure expects PCM WAV format; the Content-Type header is always set to 'audio/wav' regardless of audio.mimeType.

options?

SpeechTranscriptionOptions = {}

Optional transcription settings. Only language is supported by the Azure REST endpoint.

Returns

Promise<SpeechTranscriptionResult>

A promise resolving to the normalized transcription result.

Throws

When the Azure API returns a non-2xx HTTP status code. The error message includes the status and response body text.

Example

const result = await provider.transcribe(
  { data: wavBuffer, durationSeconds: 5 },
  { language: 'fr-FR' },
);
if (result.text === '') {
  console.log('No speech detected in the audio');
}

Implementation of

SpeechToTextProvider.transcribe

Azure REST Endpoint Format​

Authentication: Ocp-Apim-Subscription-Key​

NoMatch Handling​

Limitations​

See​

Example​

Implements​

Constructors​

Constructor​

Parameters​

config​

Returns​

Example​

Properties​

displayName​

Implementation of​

id​

Implementation of​

supportsStreaming​

Implementation of​

Methods​

getProviderName()​

Returns​

Example​

Implementation of​

transcribe()​

Parameters​

audio​

options?​

Returns​

Throws​

Example​

Implementation of​

Azure REST Endpoint Format

Authentication: `Ocp-Apim-Subscription-Key`

NoMatch Handling

Limitations

See

Example

Implements

Constructors

Constructor

Parameters

config

Returns

Example

Properties

displayName

Implementation of

id

Implementation of

supportsStreaming

Implementation of

Methods

getProviderName()

Returns

Example

Implementation of

transcribe()

Parameters

audio

options?

Returns

Throws

Example

Implementation of