Class: PdfLoader
Defined in: packages/agentos/src/memory/ingestion/PdfLoader.ts:93
Document loader for PDF files.
Extraction tiers
- unpdf — always used as the primary extraction engine. Performs pure-JS PDF text layer extraction with no native binaries required.
- OcrPdfLoader (optional) — supplied at construction time and engaged automatically when unpdf yields sparse text (< 50 chars per page on average), indicating a scanned document.
- DoclingLoader (optional) — when provided, takes precedence over both unpdf and OCR, yielding the highest-fidelity extraction at the cost of requiring a Python runtime.
Implements
Example
const ocrLoader = createOcrPdfLoader(); // null if tesseract.js absent
const doclingLoader = createDoclingLoader(); // null if docling absent
const pdfLoader = new PdfLoader(ocrLoader, doclingLoader);
const doc = await pdfLoader.load('/reports/q3.pdf');
Implements
Constructors
Constructor
new PdfLoader(
ocrLoader?,doclingLoader?):PdfLoader
Defined in: packages/agentos/src/memory/ingestion/PdfLoader.ts:116
Creates a new PdfLoader.
Parameters
ocrLoader?
Optional OCR fallback (e.g. from createOcrPdfLoader).
IDocumentLoader | null
doclingLoader?
Optional Docling loader (e.g. from createDoclingLoader).
IDocumentLoader | null
Returns
PdfLoader
Properties
supportedExtensions
readonlysupportedExtensions:string[]
Defined in: packages/agentos/src/memory/ingestion/PdfLoader.ts:95
File extensions this loader handles, each with a leading dot.
Used by LoaderRegistry to route file paths to the correct loader.
Example
['.md', '.mdx']
Implementation of
IDocumentLoader.supportedExtensions
Methods
canLoad()
canLoad(
source):boolean
Defined in: packages/agentos/src/memory/ingestion/PdfLoader.ts:129
Returns true when this loader is capable of handling source.
For string sources the check is purely extension-based. For Buffer
sources the loader may inspect magic bytes when relevant.
Parameters
source
Absolute file path or raw bytes.
string | Buffer
Returns
boolean
Implementation of
load()
load(
source,options?):Promise<LoadedDocument>
Defined in: packages/agentos/src/memory/ingestion/PdfLoader.ts:143
Parses source and returns a normalised LoadedDocument.
When source is a string the loader treats it as an absolute (or
resolvable) file path and reads the file from disk. When source is a
Buffer the loader parses the bytes directly and derives as much
metadata as possible from the buffer content alone.
Parameters
source
Absolute file path OR raw document bytes.
string | Buffer
options?
Optional hints such as a format override.
Returns
Promise<LoadedDocument>
A promise resolving to the fully-populated LoadedDocument.
Throws
When the file cannot be read or the format is not parsable.