Function: performOCR()

performOCR(opts): Promise<OCRResult>

Defined in: packages/agentos/src/api/performOCR.ts:270

Extract text from an image using AgentOS's progressive vision pipeline.

This is the recommended high-level API for OCR. It handles input resolution (file, URL, base64, Buffer), pipeline lifecycle, and result mapping so callers don't need to interact with VisionPipeline directly.

When to use `performOCR()` vs `VisionPipeline`

Use case	Recommendation
One-shot text extraction	`performOCR()`
Batch processing many images	`VisionPipeline` (create once, reuse)
Need embeddings or layout	`VisionPipeline` (richer result)
Simple scripts / quick integration	`performOCR()`

Parameters

opts

PerformOCROptions

OCR options including the image source and strategy.

Returns

Promise<OCRResult>

A promise resolving to an OCRResult with extracted text, confidence, tier info, and optional bounding-box regions.

Example

// Basic usage — file path, auto-detect everything
const { text, confidence } = await performOCR({
  image: '/path/to/receipt.png',
});

// Privacy-sensitive — never call cloud APIs
const local = await performOCR({
  image: screenshotBuffer,
  strategy: 'local-only',
});

// Best quality — go straight to cloud
const cloud = await performOCR({
  image: 'https://example.com/document.jpg',
  strategy: 'cloud-only',
  provider: 'openai',
  model: 'gpt-4o',
});

When to use performOCR() vs VisionPipeline​

Parameters​

opts​

Returns​

Example​

When to use `performOCR()` vs `VisionPipeline`

Parameters

opts

Returns

Example