Skip to main content

Class: MultimodalAggregator

Defined in: packages/agentos/src/memory/ingestion/MultimodalAggregator.ts:76

Adds auto-generated captions to ExtractedImage objects that lack one, using a caller-supplied vision LLM function.

Images are processed in parallel via Promise.allSettled so a single failed captioning attempt does not block the rest. Images whose captioning fails retain their original (un-captioned) state rather than propagating the error.

Example — with a vision LLM

const aggregator = new MultimodalAggregator({
describeImage: async (buf, mime) => myVisionLLM.describe(buf, mime),
});

const captioned = await aggregator.processImages(doc.images ?? []);

Example — passthrough (no LLM configured)

const aggregator = new MultimodalAggregator();
const unchanged = await aggregator.processImages(doc.images ?? []);

Constructors

Constructor

new MultimodalAggregator(config?): MultimodalAggregator

Defined in: packages/agentos/src/memory/ingestion/MultimodalAggregator.ts:80

Parameters

config?

MultimodalConfig

Optional configuration. Omit to use in passthrough mode.

Returns

MultimodalAggregator

Methods

processImages()

processImages(images): Promise<ExtractedImage[]>

Defined in: packages/agentos/src/memory/ingestion/MultimodalAggregator.ts:101

Enrich images with captions via the configured vision LLM.

Only images that have no existing caption field are processed. Images that already carry a caption are left unchanged to avoid redundant LLM calls.

When no describeImage function is configured all images are returned unchanged.

Parameters

images

ExtractedImage[]

Array of ExtractedImage objects to process.

Returns

Promise<ExtractedImage[]>

A promise resolving to the same-length array of ExtractedImage objects, with captions filled in where possible.