Class: AssemblyAISTTProvider
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:203
Speech-to-text provider that uses the AssemblyAI async transcription API.
Three-Step Workflow
AssemblyAI uses an asynchronous transcription pipeline that requires three sequential HTTP requests:
-
Upload —
POST /v2/uploadsends the raw audio bytes to AssemblyAI's CDN and returns anupload_url. This step is necessary because the transcript endpoint accepts URLs, not raw audio. -
Submit —
POST /v2/transcriptcreates a transcription job referencing the upload URL. Returns a transcriptidused for polling. Optional features likespeaker_labelsare enabled in this request's JSON body. -
Poll —
GET /v2/transcript/:idis called everyPOLL_INTERVAL_MS(1 second) until the transcriptstatustransitions to'completed'or'error'. The polling loop is bounded byDEFAULT_TIMEOUT_MS(120 seconds) to prevent indefinite waiting.
AbortController Usage
An optional AbortSignal can be passed via
options.providerSpecificOptions.signal to cancel the transcription at any
point. The signal is forwarded to all three fetch calls and also checked at
the top of each polling iteration. When aborted, an error is thrown
immediately without waiting for the current fetch to complete.
Error Handling
- Non-2xx responses at any step throw an
Errorwith the HTTP status and body. status === 'error'on the transcript throws with AssemblyAI's error message.- Timeout expiry throws with the transcript ID for manual inspection.
- Aborted signals throw with a descriptive cancellation message.
See
AssemblyAISTTProviderConfig for configuration options
See AssemblyAITranscript for the polling response shape.
Example
const provider = new AssemblyAISTTProvider({
apiKey: process.env.ASSEMBLYAI_API_KEY!,
});
// Basic transcription
const result = await provider.transcribe({ data: audioBuffer });
// With diarization and cancellation support
const controller = new AbortController();
const result = await provider.transcribe(
{ data: audioBuffer },
{
enableSpeakerDiarization: true,
providerSpecificOptions: { signal: controller.signal },
},
);
Implements
Constructors
Constructor
new AssemblyAISTTProvider(
config):AssemblyAISTTProvider
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:234
Parameters
config
Returns
AssemblyAISTTProvider
Properties
displayName
readonlydisplayName:"AssemblyAI"='AssemblyAI'
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:208
Human-readable display name for UI and logging.
Implementation of
SpeechToTextProvider.displayName
id
readonlyid:"assemblyai"='assemblyai'
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:205
Unique provider identifier used for registration and resolution.
Implementation of
supportsStreaming
readonlysupportsStreaming:false=false
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:215
Streaming is not supported by this provider's async pipeline. AssemblyAI does offer a separate real-time streaming API via WebSocket, but that would be a different provider implementation.
Implementation of
SpeechToTextProvider.supportsStreaming
Methods
getProviderName()
getProviderName():
string
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:249
Returns the human-readable provider name.
Returns
string
The display name string 'AssemblyAI'.
Example
provider.getProviderName(); // 'AssemblyAI'
Implementation of
SpeechToTextProvider.getProviderName
transcribe()
transcribe(
audio,options?):Promise<SpeechTranscriptionResult>
Defined in: packages/agentos/src/hearing/providers/AssemblyAISTTProvider.ts:282
Transcribes an audio buffer via the AssemblyAI three-step async pipeline: upload, submit, and poll.
Parameters
audio
Raw audio data and associated metadata. The data buffer
is uploaded to AssemblyAI's CDN in step 1.
options?
SpeechTranscriptionOptions = {}
Optional transcription settings. Pass
providerSpecificOptions.signal (an AbortSignal) to cancel
at any point in the pipeline.
Returns
Promise<SpeechTranscriptionResult>
A promise resolving to the normalized transcription result.
Throws
When the upload API returns a non-2xx status.
Throws
When the transcript submit API returns a non-2xx status.
Throws
When the polling API returns a non-2xx status.
Throws
When the transcript status becomes 'error' (includes
AssemblyAI's error message, e.g. "Audio file could not be decoded").
Throws
When the 120-second timeout is exceeded (includes the transcript ID for manual inspection via the AssemblyAI dashboard).
Throws
When the caller's AbortSignal is triggered.
Example
const result = await provider.transcribe(
{ data: wavBuffer, mimeType: 'audio/wav' },
{ enableSpeakerDiarization: true, language: 'en' },
);
console.log(result.text);
console.log(result.segments?.map(s => `[${s.speaker}] ${s.text}`));