Class: HtmlLoader
Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:175
Basic document loader for HTML (.html, .htm) files.
Text extraction strategy
<script>and<style>blocks are removed entirely.- Block-level elements (
<p>,<div>,<h1>–<h6>, etc.) are replaced with newline characters to preserve paragraph structure. - All remaining HTML tags are stripped.
- A common subset of HTML entities is decoded.
- Excessive whitespace is collapsed.
Metadata
title— extracted from the<title>element when present.wordCount— approximate count of words in the extracted text.source— absolute file path (when loaded from disk).
Implements
Example
const loader = new HtmlLoader();
const doc = await loader.load('/public/index.html');
console.log(doc.metadata.title); // e.g. 'Welcome to AgentOS'
Implements
Constructors
Constructor
new HtmlLoader():
HtmlLoader
Returns
HtmlLoader
Properties
supportedExtensions
readonlysupportedExtensions:string[]
Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:177
File extensions this loader handles, each with a leading dot.
Used by LoaderRegistry to route file paths to the correct loader.
Example
['.md', '.mdx']
Implementation of
IDocumentLoader.supportedExtensions
Methods
canLoad()
canLoad(
source):boolean
Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:184
Returns true when this loader is capable of handling source.
For string sources the check is purely extension-based. For Buffer
sources the loader may inspect magic bytes when relevant.
Parameters
source
Absolute file path or raw bytes.
string | Buffer
Returns
boolean
Implementation of
load()
load(
source,_options?):Promise<LoadedDocument>
Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:196
Parses source and returns a normalised LoadedDocument.
When source is a string the loader treats it as an absolute (or
resolvable) file path and reads the file from disk. When source is a
Buffer the loader parses the bytes directly and derives as much
metadata as possible from the buffer content alone.
Parameters
source
Absolute file path OR raw document bytes.
string | Buffer
_options?
Optional hints such as a format override.
Returns
Promise<LoadedDocument>
A promise resolving to the fully-populated LoadedDocument.
Throws
When the file cannot be read or the format is not parsable.