Function: performOCR()
performOCR(
opts):Promise<OCRResult>
Defined in: packages/agentos/src/api/performOCR.ts:270
Extract text from an image using AgentOS's progressive vision pipeline.
This is the recommended high-level API for OCR. It handles input resolution (file, URL, base64, Buffer), pipeline lifecycle, and result mapping so callers don't need to interact with VisionPipeline directly.
When to use performOCR() vs VisionPipeline
| Use case | Recommendation |
|---|---|
| One-shot text extraction | performOCR() |
| Batch processing many images | VisionPipeline (create once, reuse) |
| Need embeddings or layout | VisionPipeline (richer result) |
| Simple scripts / quick integration | performOCR() |
Parameters
opts
OCR options including the image source and strategy.
Returns
Promise<OCRResult>
A promise resolving to an OCRResult with extracted text, confidence, tier info, and optional bounding-box regions.
Example
// Basic usage — file path, auto-detect everything
const { text, confidence } = await performOCR({
image: '/path/to/receipt.png',
});
// Privacy-sensitive — never call cloud APIs
const local = await performOCR({
image: screenshotBuffer,
strategy: 'local-only',
});
// Best quality — go straight to cloud
const cloud = await performOCR({
image: 'https://example.com/document.jpg',
strategy: 'cloud-only',
provider: 'openai',
model: 'gpt-4o',
});