Class: OpenAIWhisperSpeechToTextProvider
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:146
Speech-to-text provider that uses the OpenAI Whisper transcription API.
API Contract
- Endpoint:
POST {baseUrl}/audio/transcriptions - Authentication:
Authorization: Bearer <apiKey> - Content-Type:
multipart/form-data(FormData with file blob) - Response format: Controlled by the
response_formatfield; defaults toverbose_jsonwhich includes segments, language detection, and duration.
Supported Response Formats
verbose_json— Full JSON with segments, duration, and language (default)json— Minimal JSON with just the texttext— Plain text response (no JSON)srt— SubRip subtitle formatvtt— WebVTT subtitle format
When text, srt, or vtt format is used, the response is returned as
plain text and segments are not available.
See
OpenAIWhisperSpeechToTextProviderConfig for configuration options
See normalizeSegments() for the segment normalization logic.
Example
const provider = new OpenAIWhisperSpeechToTextProvider({
apiKey: process.env.OPENAI_API_KEY!,
model: 'whisper-1',
});
const result = await provider.transcribe(
{ data: audioBuffer, mimeType: 'audio/wav', fileName: 'recording.wav' },
{ language: 'en', responseFormat: 'verbose_json' },
);
Implements
Constructors
Constructor
new OpenAIWhisperSpeechToTextProvider(
config):OpenAIWhisperSpeechToTextProvider
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:175
Parameters
config
OpenAIWhisperSpeechToTextProviderConfig
Returns
OpenAIWhisperSpeechToTextProvider
Properties
displayName
readonlydisplayName:"OpenAI Whisper"='OpenAI Whisper'
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:151
Human-readable display name for UI and logging.
Implementation of
SpeechToTextProvider.displayName
id
readonlyid:"openai-whisper"='openai-whisper'
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:148
Unique provider identifier used for registration and resolution.
Implementation of
supportsStreaming
readonlysupportsStreaming:false=false
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:154
Whisper API is batch-only; streaming requires a WebSocket adapter.
Implementation of
SpeechToTextProvider.supportsStreaming
Methods
getProviderName()
getProviderName():
string
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:190
Returns the human-readable provider name.
Returns
string
The display name string 'OpenAI Whisper'.
Example
provider.getProviderName(); // 'OpenAI Whisper'
Implementation of
SpeechToTextProvider.getProviderName
transcribe()
transcribe(
audio,options?):Promise<SpeechTranscriptionResult>
Defined in: packages/agentos/src/hearing/providers/OpenAIWhisperSpeechToTextProvider.ts:216
Transcribes an audio buffer using the OpenAI Whisper API.
The audio is sent as a multipart form upload with the file, model, and optional parameters (language, prompt, temperature, response_format).
Parameters
audio
Raw audio data and metadata. The data buffer is wrapped
in a Blob and sent as a form file field. If fileName is not provided,
a default name is generated from the format field.
options?
SpeechTranscriptionOptions = {}
Optional transcription settings including language hint, context prompt, temperature for sampling, and response format.
Returns
Promise<SpeechTranscriptionResult>
A promise resolving to the normalized transcription result.
Throws
When the OpenAI API returns a non-2xx status code.
Example
const result = await provider.transcribe(
{ data: mp3Buffer, mimeType: 'audio/mpeg', fileName: 'voice.mp3' },
{ language: 'fr', prompt: 'Discussion about AI' },
);