Class: PdfLoader

Defined in: packages/agentos/src/cognition/memory/io/ingestion/PdfLoader.ts:93

Document loader for PDF files.

Extraction tiers

unpdf — always used as the primary extraction engine. Performs pure-JS PDF text layer extraction with no native binaries required.
OCR fallback (optional) — supplied at construction time and engaged automatically when unpdf yields sparse text (< 50 chars per page on average), indicating a scanned document.
Docling fallback (optional) — when provided, takes precedence over both unpdf and OCR, yielding the highest-fidelity extraction at the cost of requiring a Python runtime.

Implements

Example

const ocrLoader    = createOcrPdfLoader();   // null if tesseract.js absent
const doclingLoader = createDoclingLoader(); // null if docling absent
const pdfLoader = new PdfLoader(ocrLoader, doclingLoader);
const doc = await pdfLoader.load('/reports/q3.pdf');

Implements

IDocumentLoader

Constructors

Constructor

new PdfLoader(ocrLoader?, doclingLoader?): PdfLoader

Defined in: packages/agentos/src/cognition/memory/io/ingestion/PdfLoader.ts:116

Creates a new PdfLoader.

Parameters

ocrLoader?

Optional OCR fallback (for example from createOcrPdfLoader()).

IDocumentLoader | null

doclingLoader?

Optional Docling loader (for example from createDoclingLoader()).

IDocumentLoader | null

Returns

PdfLoader

Properties

supportedExtensions

readonly supportedExtensions: string[]

Defined in: packages/agentos/src/cognition/memory/io/ingestion/PdfLoader.ts:95

File extensions this loader handles, each with a leading dot.

Used by LoaderRegistry to route file paths to the correct loader.

Example

['.md', '.mdx']

Implementation of

IDocumentLoader.supportedExtensions

Methods

canLoad()

canLoad(source): boolean

Defined in: packages/agentos/src/cognition/memory/io/ingestion/PdfLoader.ts:129

Returns true when this loader is capable of handling source.

For string sources the check is purely extension-based. For Buffer sources the loader may inspect magic bytes when relevant.

Parameters

source

Absolute file path or raw bytes.

string | Buffer

Returns

boolean

Implementation of

IDocumentLoader.canLoad

load()

load(source, options?): Promise<LoadedDocument>

Defined in: packages/agentos/src/cognition/memory/io/ingestion/PdfLoader.ts:148

Parses source and returns a normalised LoadedDocument.

When source is a string the loader treats it as an absolute (or resolvable) file path and reads the file from disk. When source is a Buffer the loader parses the bytes directly and derives as much metadata as possible from the buffer content alone.

Parameters

source

Absolute file path OR raw document bytes.

string | Buffer

options?

LoadOptions

Optional hints such as a format override.

Returns

Promise<LoadedDocument>

A promise resolving to the fully-populated LoadedDocument.

Throws

When the file cannot be read or the format is not parsable.

Implementation of

IDocumentLoader.load

Extraction tiers​

Implements​

Example​

Implements​

Constructors​

Constructor​

Parameters​

ocrLoader?​

doclingLoader?​

Returns​

Properties​

supportedExtensions​

Example​

Implementation of​

Methods​

canLoad()​

Parameters​

source​

Returns​

Implementation of​

load()​

Parameters​

source​

options?​

Returns​

Throws​

Implementation of​

Extraction tiers

Implements

Example

Implements

Constructors

Constructor

Parameters

ocrLoader?

doclingLoader?

Returns

Properties

supportedExtensions

Example

Implementation of

Methods

canLoad()

Parameters

source

Returns

Implementation of

load()

Parameters

source

options?

Returns

Throws

Implementation of