Class: MultimodalAggregator
Defined in: packages/agentos/src/memory/ingestion/MultimodalAggregator.ts:76
Adds auto-generated captions to ExtractedImage objects that lack one, using a caller-supplied vision LLM function.
Images are processed in parallel via Promise.allSettled so a single failed captioning attempt does not block the rest. Images whose captioning fails retain their original (un-captioned) state rather than propagating the error.
Example — with a vision LLM
const aggregator = new MultimodalAggregator({
describeImage: async (buf, mime) => myVisionLLM.describe(buf, mime),
});
const captioned = await aggregator.processImages(doc.images ?? []);
Example — passthrough (no LLM configured)
const aggregator = new MultimodalAggregator();
const unchanged = await aggregator.processImages(doc.images ?? []);
Constructors
Constructor
new MultimodalAggregator(
config?):MultimodalAggregator
Defined in: packages/agentos/src/memory/ingestion/MultimodalAggregator.ts:80
Parameters
config?
MultimodalConfig
Optional configuration. Omit to use in passthrough mode.
Returns
MultimodalAggregator
Methods
processImages()
processImages(
images):Promise<ExtractedImage[]>
Defined in: packages/agentos/src/memory/ingestion/MultimodalAggregator.ts:101
Enrich images with captions via the configured vision LLM.
Only images that have no existing caption field are processed. Images
that already carry a caption are left unchanged to avoid redundant LLM
calls.
When no describeImage function is configured all images are returned
unchanged.
Parameters
images
Array of ExtractedImage objects to process.
Returns
Promise<ExtractedImage[]>
A promise resolving to the same-length array of ExtractedImage objects, with captions filled in where possible.