Class: SemanticChunker
Defined in: packages/agentos/src/rag/chunking/SemanticChunker.ts:123
Semantic text chunker that splits on natural boundaries instead of fixed character counts.
Produces chunks that are more semantically coherent than fixed-size splitting, improving retrieval quality by keeping related ideas together.
Examples
const chunker = new SemanticChunker({ targetSize: 800, overlap: 50 });
const chunks = chunker.chunk(markdownDocument);
for (const c of chunks) {
console.log(`Chunk ${c.index} (${c.boundaryType}): ${c.text.length} chars`);
}
const chunker = new SemanticChunker({
targetSize: 1000,
maxSize: 3000, // Allow larger chunks for code blocks
preserveCodeBlocks: true,
});
const chunks = chunker.chunk(technicalDoc);
Constructors
Constructor
new SemanticChunker(
config?):SemanticChunker
Defined in: packages/agentos/src/rag/chunking/SemanticChunker.ts:147
Creates a new SemanticChunker.
Parameters
config?
Chunking configuration.
Returns
SemanticChunker
Example
const chunker = new SemanticChunker({
targetSize: 800,
maxSize: 1500,
overlap: 80,
});
Methods
chunk()
chunk(
text,metadata?):SemanticChunk[]
Defined in: packages/agentos/src/rag/chunking/SemanticChunker.ts:185
Splits text into semantically coherent chunks.
Pipeline:
- Pre-process: extract code blocks (if
preserveCodeBlocks) - Split by headings (if
respectHeadings) — each heading starts a new section - Within sections, split by paragraphs (double newline)
- If a paragraph exceeds
maxSize, split by sentences - If a sentence exceeds
maxSize, split at word boundaries (fixed fallback) - Merge small fragments (<
minSize) with the previous chunk - Add overlap from the end of the previous chunk to each chunk
Parameters
text
string
The full text to chunk.
metadata?
Record<string, unknown>
Optional metadata attached to all chunks.
Returns
Array of chunks in order.
Throws
If text is empty.
Example
const chunks = chunker.chunk(
'# Introduction\n\nFirst paragraph.\n\n## Details\n\nSecond paragraph.',
{ source: 'docs/readme.md' },
);
// chunks[0].boundaryType === 'heading'
// chunks[0].text includes "# Introduction\n\nFirst paragraph."