Skip to main content

Class: AzureSpeechTTSProvider

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:211

Text-to-speech provider that uses the Azure Cognitive Services Speech REST API.

SSML Generation

Azure's TTS REST endpoint requires SSML (Speech Synthesis Markup Language) as the request body — it does not accept plain text. This provider generates minimal SSML via buildSsml() that wraps the input text in <speak> and <voice> elements. Special XML characters in the text are escaped via escapeXml() to prevent malformed XML.

X-Microsoft-OutputFormat Options

The X-Microsoft-OutputFormat header controls the audio encoding. This provider uses 'audio-24khz-96kbitrate-mono-mp3' which provides:

  • 24 kHz sample rate (high quality for speech)
  • 96 kbps bitrate (good balance of quality and file size)
  • Mono channel (sufficient for speech synthesis)
  • MP3 format (universally supported)

Other available formats include:

  • 'audio-16khz-128kbitrate-mono-mp3' — Lower sample rate, higher bitrate
  • 'audio-24khz-160kbitrate-mono-mp3' — Higher bitrate for better quality
  • 'riff-24khz-16bit-mono-pcm' — Uncompressed WAV
  • 'ogg-24khz-16bit-mono-opus' — Opus codec in OGG container

See

Voice Listing

The listAvailableVoices method fetches the full list of neural voices available in the configured Azure region via GET /cognitiveservices/voices/list. Results are mapped to the normalized SpeechVoice shape.

Example

const provider = new AzureSpeechTTSProvider({
key: process.env.AZURE_SPEECH_KEY!,
region: 'eastus',
defaultVoice: 'en-US-GuyNeural',
});
const result = await provider.synthesize('Hello world');
// result.audioBuffer contains MP3 bytes
// result.mimeType === 'audio/mpeg'

Implements

Constructors

Constructor

new AzureSpeechTTSProvider(config): AzureSpeechTTSProvider

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:246

Creates a new AzureSpeechTTSProvider.

Parameters

config

AzureSpeechTTSProviderConfig

Provider configuration including the subscription key, region, and optional default voice.

Returns

AzureSpeechTTSProvider

Example

const provider = new AzureSpeechTTSProvider({
key: 'your-azure-subscription-key',
region: 'westeurope',
defaultVoice: 'de-DE-ConradNeural',
});

Properties

displayName

readonly displayName: "Azure Speech (TTS)" = 'Azure Speech (TTS)'

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:216

Human-readable display name for UI and logging.

Implementation of

TextToSpeechProvider.displayName


id

readonly id: "azure-speech-tts" = 'azure-speech-tts'

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:213

Unique provider identifier used for registration and resolution.

Implementation of

TextToSpeechProvider.id


supportsStreaming

readonly supportsStreaming: true = true

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:223

Marked as streaming-capable because the provider can be used within a streaming pipeline — though the actual HTTP request is a single synchronous call that returns the complete audio buffer.

Implementation of

TextToSpeechProvider.supportsStreaming

Methods

getProviderName()

getProviderName(): string

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:261

Returns the human-readable provider name.

Returns

string

The display name string 'Azure Speech (TTS)'.

Example

provider.getProviderName(); // 'Azure Speech (TTS)'

Implementation of

TextToSpeechProvider.getProviderName


listAvailableVoices()

listAvailableVoices(): Promise<SpeechVoice[]>

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:353

Retrieves the list of available neural voices from the Azure region.

Fetches from GET /cognitiveservices/voices/list and maps each entry to the normalized SpeechVoice shape. The list includes all neural and standard voices available in the configured region.

Returns

Promise<SpeechVoice[]>

A promise resolving to an array of normalized voice entries.

Throws

When the Azure API returns a non-2xx status code (e.g. invalid key, network error).

Example

const voices = await provider.listAvailableVoices();
const englishVoices = voices.filter(v => v.lang.startsWith('en-'));
console.log(`Found ${englishVoices.length} English voices`);

Implementation of

TextToSpeechProvider.listAvailableVoices


synthesize()

synthesize(text, options?): Promise<SpeechSynthesisResult>

Defined in: packages/agentos/src/speech/providers/AzureSpeechTTSProvider.ts:288

Synthesizes speech from plain text using the Azure TTS REST endpoint.

The text is wrapped in SSML, sent to Azure, and the response audio buffer (MP3 format) is returned along with metadata.

Parameters

text

string

The plain-text utterance to convert to audio. XML special characters are automatically escaped.

options?

SpeechSynthesisOptions = {}

Optional synthesis settings. Use options.voice to override the default voice with any valid Azure voice short-name.

Returns

Promise<SpeechSynthesisResult>

A promise resolving to the MP3 audio buffer and metadata.

Throws

When the Azure API returns a non-2xx status code. Common causes: invalid subscription key (401), region mismatch (404), invalid SSML (400), or quota exceeded (429).

Example

const result = await provider.synthesize('Guten Tag!', {
voice: 'de-DE-ConradNeural',
});
fs.writeFileSync('output.mp3', result.audioBuffer);

Implementation of

TextToSpeechProvider.synthesize