Interface: VisionPipelineConfig

Defined in: packages/agentos/src/io/vision/types.ts:303

Configuration for the VisionPipeline.

All fields are optional — the factory function createVisionPipeline auto-detects available providers and fills in sensible defaults.

Example

const config: VisionPipelineConfig = {
  strategy: 'progressive',
  ocr: 'paddle',
  handwriting: true,
  documentAI: true,
  embedding: true,
  cloudProvider: 'openai',
  cloudModel: 'gpt-4o',
  confidenceThreshold: 0.8,
  preprocessing: { grayscale: true, sharpen: true },
};

Properties

cloudModel?

optional cloudModel: string

Defined in: packages/agentos/src/io/vision/types.ts:351

Cloud model override. When unset, the provider's default vision model is used.

Example

'gpt-4o', 'claude-sonnet-4-20250514', 'gemini-2.0-flash'

cloudProvider?

optional cloudProvider: string

Defined in: packages/agentos/src/io/vision/types.ts:345

Cloud vision LLM provider name for Tier 3 fallback. Must match a provider known to generateText() (e.g. 'openai', 'anthropic', 'google'). When unset, cloud vision is disabled.

confidenceThreshold?

optional confidenceThreshold: number

Defined in: packages/agentos/src/io/vision/types.ts:359

Minimum confidence to accept an OCR result without escalating to cloud. Only applies to 'progressive' strategy — if OCR confidence is below this threshold, the pipeline escalates to the next tier.

Default

0.7

documentAI?

optional documentAI: boolean

Defined in: packages/agentos/src/io/vision/types.ts:331

Enable document understanding via Florence-2 (@huggingface/transformers). Produces structured DocumentLayout with semantic block detection.

Default

false

embedding?

optional embedding: boolean

Defined in: packages/agentos/src/io/vision/types.ts:338

Enable CLIP image embeddings (@huggingface/transformers). Runs in parallel with other tiers — does not affect text extraction.

Default

false

handwriting?

optional handwriting: boolean

Defined in: packages/agentos/src/io/vision/types.ts:324

Enable handwriting recognition via TrOCR (@huggingface/transformers). Only triggered when OCR confidence is low and content appears handwritten.

Default

false

ocr?

optional ocr: "none" | "paddle" | "tesseract"

Defined in: packages/agentos/src/io/vision/types.ts:317

OCR engine for Tier 1 text extraction.

'paddle' — PaddleOCR (via ppu-paddle-ocr). Best accuracy for most scripts.
'tesseract' — Tesseract.js. Wider language support, slightly lower accuracy.
'none' — Skip OCR entirely (useful for photo-only pipelines).

Default

'paddle' (if installed), else 'tesseract' (if installed), else 'none'

preprocessing?

optional preprocessing: VisionPreprocessingConfig

Defined in: packages/agentos/src/io/vision/types.ts:366

Image preprocessing options applied before any tier runs. Uses sharp for resizing, grayscale conversion, sharpening, and normalization.

strategy

strategy: VisionStrategy

Defined in: packages/agentos/src/io/vision/types.ts:308

How to combine tiers.

Default

'progressive'

Example​

Properties​

cloudModel?​

Example​

cloudProvider?​

confidenceThreshold?​

Default​

documentAI?​

Default​

embedding?​

Default​

handwriting?​

Default​

ocr?​

Default​

preprocessing?​

strategy​

Default​

Example

Properties

cloudModel?

Example

cloudProvider?

confidenceThreshold?

Default

documentAI?

Default

embedding?

Default

handwriting?

Default

ocr?

Default

preprocessing?

strategy

Default