Class: BM25Index
Defined in: packages/agentos/src/rag/search/BM25Index.ts:156
BM25 sparse keyword index for hybrid retrieval.
Dense embeddings excel at semantic similarity but miss exact keyword matches (e.g., error codes, function names, product IDs). BM25 catches these by scoring documents based on term frequency, inverse document frequency, and document length normalization.
Examples
const index = new BM25Index({ k1: 1.5, b: 0.75 });
index.addDocuments([
{ id: 'doc-1', text: 'TypeScript compiler error TS2304' },
{ id: 'doc-2', text: 'JavaScript runtime TypeError explanation' },
{ id: 'doc-3', text: 'Fix error TS2304 by adding type declarations' },
]);
const results = index.search('error TS2304', 5);
// results[0].id === 'doc-3' (exact match on "error" + "TS2304")
// results[1].id === 'doc-1' (exact match on "error" + "TS2304")
const hybrid = new HybridSearcher(vectorStore, embeddingManager, bm25Index, {
denseWeight: 0.7,
sparseWeight: 0.3,
});
const results = await hybrid.search('What does error TS2304 mean?');
Constructors
Constructor
new BM25Index(
config?):BM25Index
Defined in: packages/agentos/src/rag/search/BM25Index.ts:200
Creates a new BM25 index.
Parameters
config?
Optional BM25 tuning parameters.
Returns
BM25Index
Example
// Use defaults (k1=1.2, b=0.75)
const index = new BM25Index();
// Custom parameters for short documents
const shortDocIndex = new BM25Index({ k1: 1.5, b: 0.5 });
Methods
addDocument()
addDocument(
id,text,metadata?):void
Defined in: packages/agentos/src/rag/search/BM25Index.ts:289
Adds a single document to the BM25 index.
The text is tokenized, stop words are removed, and term frequencies are recorded in the inverted index. IDF values are lazily recomputed on the next search.
Parameters
id
string
Unique document identifier.
text
string
Document text content to index.
metadata?
Record<string, unknown>
Optional metadata to store.
Returns
void
Throws
If id is empty or text is empty.
Example
index.addDocument('readme', 'AgentOS is a framework for building AI agents');
index.addDocument('changelog', 'v2.0: Added BM25 hybrid search', { version: '2.0' });
addDocuments()
addDocuments(
docs):void
Defined in: packages/agentos/src/rag/search/BM25Index.ts:343
Adds multiple documents to the index in a single batch.
More efficient than calling addDocument repeatedly because IDF recomputation is deferred until the next search.
Parameters
docs
object[]
Array of documents to index.
Returns
void
Example
index.addDocuments([
{ id: 'doc-1', text: 'First document content' },
{ id: 'doc-2', text: 'Second document content', metadata: { source: 'api' } },
]);
getStats()
getStats():
BM25Stats
Defined in: packages/agentos/src/rag/search/BM25Index.ts:456
Returns current index statistics.
Returns
Object containing document count, term count, and average document length.
Example
const stats = index.getStats();
console.log(`${stats.documentCount} docs, ${stats.termCount} unique terms`);
removeDocument()
removeDocument(
id):boolean
Defined in: packages/agentos/src/rag/search/BM25Index.ts:427
Removes a document from the index by its ID.
Cleans up all term frequency entries in the inverted index and marks IDF for recomputation.
Parameters
id
string
Document ID to remove.
Returns
boolean
true if the document existed and was removed, false otherwise.
Example
const removed = index.removeDocument('doc-obsolete');
console.log(removed ? 'Removed' : 'Not found');
search()
search(
query,topK?):BM25Result[]
Defined in: packages/agentos/src/rag/search/BM25Index.ts:371
Searches the BM25 index for documents matching the query.
Scoring formula per document D and query Q:
score(D, Q) = sum_{t in Q} IDF(t) * (tf(t,D) * (k1 + 1)) / (tf(t,D) + k1 * (1 - b + b * |D| / avgdl))
Parameters
query
string
Search query text.
topK?
number = 10
Maximum number of results to return.
Returns
Array of results sorted by BM25 score descending.
Example
const results = index.search('typescript error TS2304', 5);
for (const r of results) {
console.log(`${r.id}: score=${r.score.toFixed(4)}`);
}