Interface: VisionPipelineConfig
Defined in: packages/agentos/src/core/vision/types.ts:303
Configuration for the VisionPipeline.
All fields are optional — the factory function createVisionPipeline auto-detects available providers and fills in sensible defaults.
Example
const config: VisionPipelineConfig = {
strategy: 'progressive',
ocr: 'paddle',
handwriting: true,
documentAI: true,
embedding: true,
cloudProvider: 'openai',
cloudModel: 'gpt-4o',
confidenceThreshold: 0.8,
preprocessing: { grayscale: true, sharpen: true },
};
Properties
cloudModel?
optionalcloudModel:string
Defined in: packages/agentos/src/core/vision/types.ts:351
Cloud model override. When unset, the provider's default vision model is used.
Example
'gpt-4o', 'claude-sonnet-4-20250514', 'gemini-2.0-flash'
cloudProvider?
optionalcloudProvider:string
Defined in: packages/agentos/src/core/vision/types.ts:345
Cloud vision LLM provider name for Tier 3 fallback.
Must match a provider known to generateText() (e.g. 'openai', 'anthropic', 'google').
When unset, cloud vision is disabled.
confidenceThreshold?
optionalconfidenceThreshold:number
Defined in: packages/agentos/src/core/vision/types.ts:359
Minimum confidence to accept an OCR result without escalating to cloud.
Only applies to 'progressive' strategy — if OCR confidence is below
this threshold, the pipeline escalates to the next tier.
Default
0.7
documentAI?
optionaldocumentAI:boolean
Defined in: packages/agentos/src/core/vision/types.ts:331
Enable document understanding via Florence-2 (@huggingface/transformers).
Produces structured DocumentLayout with semantic block detection.
Default
false
embedding?
optionalembedding:boolean
Defined in: packages/agentos/src/core/vision/types.ts:338
Enable CLIP image embeddings (@huggingface/transformers).
Runs in parallel with other tiers — does not affect text extraction.
Default
false
handwriting?
optionalhandwriting:boolean
Defined in: packages/agentos/src/core/vision/types.ts:324
Enable handwriting recognition via TrOCR (@huggingface/transformers).
Only triggered when OCR confidence is low and content appears handwritten.
Default
false
ocr?
optionalocr:"none"|"paddle"|"tesseract"
Defined in: packages/agentos/src/core/vision/types.ts:317
OCR engine for Tier 1 text extraction.
'paddle'— PaddleOCR (viappu-paddle-ocr). Best accuracy for most scripts.'tesseract'— Tesseract.js. Wider language support, slightly lower accuracy.'none'— Skip OCR entirely (useful for photo-only pipelines).
Default
'paddle' (if installed), else 'tesseract' (if installed), else 'none'
preprocessing?
optionalpreprocessing:VisionPreprocessingConfig
Defined in: packages/agentos/src/core/vision/types.ts:366
Image preprocessing options applied before any tier runs.
Uses sharp for resizing, grayscale conversion, sharpening,
and normalization.
strategy
strategy:
VisionStrategy
Defined in: packages/agentos/src/core/vision/types.ts:308
How to combine tiers.
Default
'progressive'