Skip to main content

Class: HtmlLoader

Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:175

Basic document loader for HTML (.html, .htm) files.

Text extraction strategy

  1. <script> and <style> blocks are removed entirely.
  2. Block-level elements (<p>, <div>, <h1><h6>, etc.) are replaced with newline characters to preserve paragraph structure.
  3. All remaining HTML tags are stripped.
  4. A common subset of HTML entities is decoded.
  5. Excessive whitespace is collapsed.

Metadata

  • title — extracted from the <title> element when present.
  • wordCount — approximate count of words in the extracted text.
  • source — absolute file path (when loaded from disk).

Implements

Example

const loader = new HtmlLoader();
const doc = await loader.load('/public/index.html');
console.log(doc.metadata.title); // e.g. 'Welcome to AgentOS'

Implements

Constructors

Constructor

new HtmlLoader(): HtmlLoader

Returns

HtmlLoader

Properties

supportedExtensions

readonly supportedExtensions: string[]

Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:177

File extensions this loader handles, each with a leading dot.

Used by LoaderRegistry to route file paths to the correct loader.

Example

['.md', '.mdx']

Implementation of

IDocumentLoader.supportedExtensions

Methods

canLoad()

canLoad(source): boolean

Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:184

Returns true when this loader is capable of handling source.

For string sources the check is purely extension-based. For Buffer sources the loader may inspect magic bytes when relevant.

Parameters

source

Absolute file path or raw bytes.

string | Buffer

Returns

boolean

Implementation of

IDocumentLoader.canLoad


load()

load(source, _options?): Promise<LoadedDocument>

Defined in: packages/agentos/src/memory/ingestion/HtmlLoader.ts:196

Parses source and returns a normalised LoadedDocument.

When source is a string the loader treats it as an absolute (or resolvable) file path and reads the file from disk. When source is a Buffer the loader parses the bytes directly and derives as much metadata as possible from the buffer content alone.

Parameters

source

Absolute file path OR raw document bytes.

string | Buffer

_options?

LoadOptions

Optional hints such as a format override.

Returns

Promise<LoadedDocument>

A promise resolving to the fully-populated LoadedDocument.

Throws

When the file cannot be read or the format is not parsable.

Implementation of

IDocumentLoader.load