Skip to main content

Interface: VisionPipelineConfig

Defined in: packages/agentos/src/core/vision/types.ts:303

Configuration for the VisionPipeline.

All fields are optional — the factory function createVisionPipeline auto-detects available providers and fills in sensible defaults.

Example

const config: VisionPipelineConfig = {
strategy: 'progressive',
ocr: 'paddle',
handwriting: true,
documentAI: true,
embedding: true,
cloudProvider: 'openai',
cloudModel: 'gpt-4o',
confidenceThreshold: 0.8,
preprocessing: { grayscale: true, sharpen: true },
};

Properties

cloudModel?

optional cloudModel: string

Defined in: packages/agentos/src/core/vision/types.ts:351

Cloud model override. When unset, the provider's default vision model is used.

Example

'gpt-4o', 'claude-sonnet-4-20250514', 'gemini-2.0-flash'

cloudProvider?

optional cloudProvider: string

Defined in: packages/agentos/src/core/vision/types.ts:345

Cloud vision LLM provider name for Tier 3 fallback. Must match a provider known to generateText() (e.g. 'openai', 'anthropic', 'google'). When unset, cloud vision is disabled.


confidenceThreshold?

optional confidenceThreshold: number

Defined in: packages/agentos/src/core/vision/types.ts:359

Minimum confidence to accept an OCR result without escalating to cloud. Only applies to 'progressive' strategy — if OCR confidence is below this threshold, the pipeline escalates to the next tier.

Default

0.7

documentAI?

optional documentAI: boolean

Defined in: packages/agentos/src/core/vision/types.ts:331

Enable document understanding via Florence-2 (@huggingface/transformers). Produces structured DocumentLayout with semantic block detection.

Default

false

embedding?

optional embedding: boolean

Defined in: packages/agentos/src/core/vision/types.ts:338

Enable CLIP image embeddings (@huggingface/transformers). Runs in parallel with other tiers — does not affect text extraction.

Default

false

handwriting?

optional handwriting: boolean

Defined in: packages/agentos/src/core/vision/types.ts:324

Enable handwriting recognition via TrOCR (@huggingface/transformers). Only triggered when OCR confidence is low and content appears handwritten.

Default

false

ocr?

optional ocr: "none" | "paddle" | "tesseract"

Defined in: packages/agentos/src/core/vision/types.ts:317

OCR engine for Tier 1 text extraction.

  • 'paddle' — PaddleOCR (via ppu-paddle-ocr). Best accuracy for most scripts.
  • 'tesseract' — Tesseract.js. Wider language support, slightly lower accuracy.
  • 'none' — Skip OCR entirely (useful for photo-only pipelines).

Default

'paddle' (if installed), else 'tesseract' (if installed), else 'none'

preprocessing?

optional preprocessing: VisionPreprocessingConfig

Defined in: packages/agentos/src/core/vision/types.ts:366

Image preprocessing options applied before any tier runs. Uses sharp for resizing, grayscale conversion, sharpening, and normalization.


strategy

strategy: VisionStrategy

Defined in: packages/agentos/src/core/vision/types.ts:308

How to combine tiers.

Default

'progressive'