Skip to main content

Interface: IVisionProvider

Defined in: packages/agentos/src/rag/multimodal/types.ts:277

Minimal interface for a vision LLM that can describe images.

This is kept intentionally narrow to avoid coupling the multimodal indexer to a specific LLM provider. Any service that can take an image and return a text description satisfies this contract.

Example

const visionProvider: IVisionProvider = {
describeImage: async (image) => {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: [
{ type: 'text', text: 'Describe this image in detail.' },
{ type: 'image_url', image_url: { url: imageUrl } },
]}],
});
return response.choices[0].message.content!;
},
};

Methods

describeImage()

describeImage(image): Promise<string>

Defined in: packages/agentos/src/rag/multimodal/types.ts:284

Generate a text description of the provided image.

Parameters

image

string

Image as a URL string or base64 data URL.

Returns

Promise<string>

A detailed text description of the image content.