Skip to main content

Function: performOCR()

performOCR(opts): Promise<OCRResult>

Defined in: packages/agentos/src/api/performOCR.ts:270

Extract text from an image using AgentOS's progressive vision pipeline.

This is the recommended high-level API for OCR. It handles input resolution (file, URL, base64, Buffer), pipeline lifecycle, and result mapping so callers don't need to interact with VisionPipeline directly.

When to use performOCR() vs VisionPipeline

Use caseRecommendation
One-shot text extractionperformOCR()
Batch processing many imagesVisionPipeline (create once, reuse)
Need embeddings or layoutVisionPipeline (richer result)
Simple scripts / quick integrationperformOCR()

Parameters

opts

PerformOCROptions

OCR options including the image source and strategy.

Returns

Promise<OCRResult>

A promise resolving to an OCRResult with extracted text, confidence, tier info, and optional bounding-box regions.

Example

// Basic usage — file path, auto-detect everything
const { text, confidence } = await performOCR({
image: '/path/to/receipt.png',
});

// Privacy-sensitive — never call cloud APIs
const local = await performOCR({
image: screenshotBuffer,
strategy: 'local-only',
});

// Best quality — go straight to cloud
const cloud = await performOCR({
image: 'https://example.com/document.jpg',
strategy: 'cloud-only',
provider: 'openai',
model: 'gpt-4o',
});