RAG Memory Configuration
AgentOS provides two levels of memory API:
AgentMemory— High-level facade with simpleremember(),recall(),observe(),search()methods. No knowledge of PAD mood models or HEXACO traits required.- Low-level RAG primitives —
EmbeddingManager,VectorStoreManager,RetrievalAugmentor,GraphRAGEnginefor custom pipelines.
High-Level API: AgentMemory
import { AgentMemory } from '@framers/agentos';
// Wrap an existing CognitiveMemoryManager (e.g., in wunderland)
const memory = AgentMemory.wrap(existingManager);
// Or create standalone
const memory = new AgentMemory();
await memory.initialize(cognitiveMemoryConfig);
// Store information
await memory.remember(“User prefers dark mode”);
await memory.remember(“Deploy by Friday”, { type: 'prospective', tags: ['deadline'] });
// Recall relevant memories (uses HyDE when enabled)
const results = await memory.recall(“what does the user prefer?”);
for (const m of results.memories) {
console.log(m.content, m.retrievalScore);
}
// Observe conversation turns (observational memory)
await memory.observe('user', “Can you help me debug this?”);
await memory.observe('assistant', “Sure! The issue is in your useEffect...”);
// Get assembled context for prompt injection
const context = await memory.getContext(“TMJ treatment”, { tokenBudget: 2000 });
// Set reminders (prospective memory)
await memory.remind({
content: “Remind about deploy deadline”,
triggerType: 'time',
triggerAt: Date.now() + 3600000,
});
// Consolidation (merge, strengthen, decay)
await memory.consolidate();
// Health diagnostics
const health = await memory.health();
// Access underlying CognitiveMemoryManager for advanced usage
const raw = memory.raw;
Observational Memory
Observational memory is a background system that compresses long-running conversation history into dense, searchable memory traces. Instead of keeping the entire conversation in context, the system progressively distills it through three tiers:
Recent Messages (raw conversation turns)
→ Observations (concise notes extracted by MemoryObserver)
→ Reflections (long-term memory traces produced by MemoryReflector)
How It Works
- ObservationBuffer accumulates every message fed via
observe(). It tracks approximate token count (~4 chars/token). - MemoryObserver activates when the buffer reaches 30,000 tokens. It sends buffered messages to a cheap LLM, which extracts typed observation notes (
factual,emotional,commitment,preference,creative,correction). The LLM prompt is biased by the agent's HEXACO personality traits — high Emotionality focuses on tone shifts, high Conscientiousness on deadlines, high Openness on creative tangents. - MemoryReflector accumulates observation notes. When they exceed 40,000 tokens, it consolidates them into long-term
MemoryTraceobjects with 5-40x compression. Conflict resolution is personality-driven: high Honesty prefers newer information and supersedes old traces; high Agreeableness keeps both versions.
Integration with RAG
Reflection traces are encoded into the vector store via CognitiveMemoryManager.encode(), making them searchable through the standard RAG pipeline. When HyDE is enabled, a query like "what did we decide about deployment?" generates a hypothetical answer, and that embedding finds relevant reflection traces alongside regular memories.
Superseded traces are soft-deleted so they no longer surface in retrieval results.
API
The high-level entry point is AgentMemory.observe():
// Feed every conversation turn to the observer
await memory.observe('user', userMessage);
await memory.observe('assistant', assistantResponse);
Internally, CognitiveMemoryManager.observe() orchestrates the full pipeline: buffer → observer → reflector → encode traces → soft-delete superseded.
Configuration
Enable observational memory in CognitiveMemoryConfig:
await memory.initialize({
// ... core config ...
observer: {
activationThresholdTokens: 30_000, // trigger observation extraction
llmInvoker, // (system, user) => Promise<string>
},
reflector: {
activationThresholdTokens: 40_000, // trigger reflection consolidation
llmInvoker,
},
});
Both thresholds can be tuned. Lower thresholds produce more frequent, finer-grained observations at higher LLM cost. Higher thresholds batch more context but risk losing detail.
In persona JSON, the observer/reflector activate automatically when memoryConfig.enabled = true and an llmInvoker is available. No additional persona-level config is required.
Low-Level RAG Primitives
The concrete RAG APIs live under @framers/agentos/rag:
EmbeddingManager— Text → vector embeddings (OpenAI, Ollama, custom providers)VectorStoreManager— HNSW/InMemory vector storage with similarity searchRetrievalAugmentor— Orchestrates embedding + search + context assemblyHydeRetriever— Hypothetical Document Embedding for better recall (generates pseudo-answers before searching)GraphRAGEngine— TypeScript-native graph-based RAG with knowledge graph traversal
For most use cases, prefer AgentMemory over direct RAG primitive usage.
Enabling RAG In AgentOS
There are two supported ways to provide an augmentor to GMIs.
Option A: Provide a ready retrievalAugmentor instance
You construct and initialize the augmentor yourself, then pass it into AgentOS.initialize():
import { AgentOS } from '@framers/agentos';
import { EmbeddingManager, VectorStoreManager, RetrievalAugmentor } from '@framers/agentos/rag';
import { AIModelProviderManager } from '@framers/agentos/core/llm/providers/AIModelProviderManager';
// 1) Provider manager (must support embeddings for your chosen embedding model)
const providers = new AIModelProviderManager();
await providers.initialize({
providers: [
{
providerId: 'openai',
enabled: true,
isDefault: true,
config: { apiKey: process.env.OPENAI_API_KEY },
},
],
});
// 2) Embeddings
const embeddingManager = new EmbeddingManager();
await embeddingManager.initialize(
{
embeddingModels: [
{ modelId: 'text-embedding-3-small', providerId: 'openai', dimension: 1536, isDefault: true },
],
},
providers,
);
// 3) Vector stores + data sources
const vectorStoreManager = new VectorStoreManager();
await vectorStoreManager.initialize(
{
managerId: 'rag-vsm',
providers: [
{
id: 'sql-store',
type: 'sql',
storage: { filePath: './data/agentos_vectors.db', priority: ['better-sqlite3', 'sqljs'] },
},
],
defaultProviderId: 'sql-store',
defaultEmbeddingDimension: 1536,
},
[
{
dataSourceId: 'voice_conversation_summaries',
displayName: 'Conversation Summaries',
vectorStoreProviderId: 'sql-store',
actualNameInProvider: 'voice_conversation_summaries',
embeddingDimension: 1536,
isDefaultIngestionSource: true,
isDefaultQuerySource: true,
},
],
);
// 4) Retrieval augmentor
const rag = new RetrievalAugmentor();
await rag.initialize(
{
defaultDataSourceId: 'voice_conversation_summaries',
categoryBehaviors: [],
},
embeddingManager,
vectorStoreManager,
);
// 5) Pass into AgentOS
const agentos = new AgentOS();
await agentos.initialize({
// ...your normal AgentOSConfig...
retrievalAugmentor: rag,
manageRetrievalAugmentorLifecycle: true,
});
Option B: Let AgentOS create the RAG subsystem (ragConfig)
If you don’t want to manage instantiation, use AgentOSConfig.ragConfig. AgentOS will create:
EmbeddingManager → VectorStoreManager → RetrievalAugmentor, and pass the augmentor into GMIs.
import { AgentOS } from '@framers/agentos';
const agentos = new AgentOS();
await agentos.initialize({
// ...your normal AgentOSConfig...
ragConfig: {
embeddingManagerConfig: {
embeddingModels: [
{ modelId: 'text-embedding-3-small', providerId: 'openai', dimension: 1536, isDefault: true },
],
},
vectorStoreManagerConfig: {
managerId: 'rag-vsm',
providers: [
{ id: 'sql-store', type: 'sql', storage: { filePath: './data/agentos_vectors.db' } },
],
defaultProviderId: 'sql-store',
defaultEmbeddingDimension: 1536,
},
dataSourceConfigs: [
{
dataSourceId: 'voice_conversation_summaries',
displayName: 'Conversation Summaries',
vectorStoreProviderId: 'sql-store',
actualNameInProvider: 'voice_conversation_summaries',
embeddingDimension: 1536,
isDefaultIngestionSource: true,
isDefaultQuerySource: true,
},
],
retrievalAugmentorConfig: {
defaultDataSourceId: 'voice_conversation_summaries',
categoryBehaviors: [],
},
},
});
Notes:
- If
retrievalAugmentoris provided, it takes precedence overragConfig. ragConfig.manageLifecycledefaults totrue.ragConfig.bindToStorageAdapterdefaults totrueand will inject AgentOS’storageAdapterinto SQL vector store providers that did not specifyadapterorstorage.
Long-Term Memory Recall (Aggressive Default)
Prompt-injected durable memory retrieval (longTermMemoryRetriever) is controlled by orchestratorConfig.longTermMemoryRecall.
Default profile is intentionally aggressive for higher recall and task success:
profile: "aggressive"cadenceTurns: 2forceOnCompaction: truemaxContextChars: 4200topKByScope: { user: 8, persona: 8, organization: 8 }
Example:
await agentos.initialize({
// ...
orchestratorConfig: {
longTermMemoryRecall: {
profile: 'aggressive', // default
// Optional explicit overrides:
cadenceTurns: 2,
forceOnCompaction: true,
maxContextChars: 4200,
topKByScope: { user: 8, persona: 8, organization: 8 },
},
},
});
If you need lower token usage, switch to:
profile: "balanced"profile: "conservative"
Single-Tenant vs Multi-Tenant Routing
organizationId routing behavior is controlled by orchestratorConfig.tenantRouting:
multi_tenant(default): uses request-scopedorganizationIdwhen provided.single_tenant: collapses all turns to one org context (optional strict mode).
await agentos.initialize({
// ...
orchestratorConfig: {
tenantRouting: {
mode: 'single_tenant',
defaultOrganizationId: 'acme-org',
strictOrganizationIsolation: true,
},
},
});
With strict single-tenant isolation enabled, mismatched organizationId values are rejected.
Persona memoryConfig.ragConfig (Triggers and Data Sources)
RAG retrieval/ingestion in the GMI is driven by persona configuration. At minimum:
memoryConfig.ragConfig.enabled = trueretrievalTriggers.onUserQuery = trueto retrieve on user turnsingestionTriggers.onTurnSummary = trueto ingest post-turn summariesdefaultIngestionDataSourceIdset to a data source you configured in the RAG subsystem
Minimal example (persona JSON):
{
"memoryConfig": {
"enabled": true,
"ragConfig": {
"enabled": true,
"retrievalTriggers": { "onUserQuery": true },
"ingestionTriggers": { "onTurnSummary": true },
"defaultRetrievalTopK": 5,
"defaultIngestionDataSourceId": "voice_conversation_summaries",
"dataSources": [
{
"id": "voice_conversation_summaries",
"dataSourceNameOrId": "voice_conversation_summaries",
"isEnabled": true,
"displayName": "Conversation Summaries"
}
]
}
}
}
Ingestion summarization is opt-in
Turn-summary ingestion can be cheap by storing raw text. Summarization is enabled only when:
{
"memoryConfig": {
"ragConfig": {
"ingestionProcessing": {
"summarization": { "enabled": true }
}
}
}
}
Manual Ingest and Retrieve
You can use the augmentor directly (useful for knowledge-base ingestion pipelines):
await rag.ingestDocuments(
[
{ id: 'doc-1', content: 'AgentOS is a TypeScript runtime for AI agents.' },
{ id: 'doc-2', content: 'GMIs maintain persistent identity across sessions.' },
],
{ targetDataSourceId: 'voice_conversation_summaries' },
);
const result = await rag.retrieveContext('How do GMIs work?', { topK: 5 });
console.log(result.augmentedContext);
Vector Store Providers In This Repo
AgentOS currently ships these vector-store implementations:
InMemoryVectorStore(ephemeral, dev/testing)SqlVectorStore(persistent via@framers/sql-storage-adapter; embeddings stored as JSON blobs; optional SQLite FTS for hybrid)HnswlibVectorStore(ANN search viahnswlib-node, optional peer dependency; optional file persistence viapersistDirectory)QdrantVectorStore(remote/self-hosted Qdrant via HTTP; optional BM25 sparse vectors + hybrid fusion)
If you want “true” large-scale vector DB behavior (tens of millions of vectors, filtered search at scale, etc.), add a provider implementation and wire it into VectorStoreManager.
Qdrant Provider (Remote or Self-Hosted)
QdrantVectorStore lets you point AgentOS at a Qdrant instance (local Docker or managed cloud) without changing any higher-level RAG code.
Example VectorStoreManager provider config:
import type { VectorStoreManagerConfig } from '@framers/agentos/config/VectorStoreConfiguration';
const vsmConfig: VectorStoreManagerConfig = {
managerId: 'rag-vsm',
providers: [
{
id: 'qdrant-main',
type: 'qdrant',
url: process.env.QDRANT_URL!,
apiKey: process.env.QDRANT_API_KEY,
enableBm25: true,
},
],
defaultProviderId: 'qdrant-main',
defaultEmbeddingDimension: 1536,
};
GraphRAG (Optional)
GraphRAGEngine exists as a TypeScript-native implementation (graphology + Louvain community detection). It is not automatically used by GMIs by default; treat it as an advanced subsystem you opt into when your problem benefits from entity/relationship structure.
- If you use non-OpenAI embedding models (e.g., Ollama), set
GraphRAGConfig.embeddingDimension, or provide anembeddingManagerso the engine can probe the embedding dimension at runtime. GraphRAGEnginecan run without embeddings and/or without an LLM:- No embeddings: falls back to text matching (lower quality; no vector search).
- No LLM: falls back to pattern-based extraction (no model calls).
GraphRAGEngine.ingestDocuments()supports update semantics when you re-ingest the samedocumentIdwith new content (it subtracts prior per-document contributions before applying the new extraction).- To keep a GraphRAG index consistent with deletes or category/collection moves, call
GraphRAGEngine.removeDocuments([documentId, ...]).
Minimal lifecycle example:
import { GraphRAGEngine } from '@framers/agentos/rag/graphrag';
const engine = new GraphRAGEngine({
// Optional:
// - vectorStore
// - embeddingManager
// - llmProvider
// - persistenceAdapter
});
await engine.initialize({ engineId: 'graphrag-demo' });
// Ingest (or update) using a stable documentId.
await engine.ingestDocuments([{ id: 'doc-1', content: 'Alice founded Wonderland Inc.' }]);
await engine.ingestDocuments([{ id: 'doc-1', content: 'Bob founded Wonderland Inc.' }]); // update
// Delete or move out of GraphRAG policy scope.
await engine.removeDocuments(['doc-1']);
Troubleshooting updates:
- If you see warnings about missing previous contribution records, you upgraded from an older persistence format.
- Fix: rebuild the GraphRAG index (clear its persisted state and re-ingest documents).
Immutability Notes (Sealed Agents)
If you run with an append-only / sealed storage policy, avoid hard deletes of memory or history. Prefer append-only tombstones/redactions so retrieval can ignore forgotten items while the audit trail remains verifiable.
Combined Vector + GraphRAG Search
The HTTP API supports running vector retrieval and GraphRAG in a single request via includeGraphRag: true. This combines:
- Vector + BM25 hybrid — standard chunk retrieval with Reciprocal Rank Fusion
- GraphRAG local search — entity/relationship/community traversal
The response includes both chunks (ranked vector results) and graphContext (entities, relationships, community context) in one payload.
curl -s -X POST http://localhost:3001/api/agentos/rag/query \
-H ‘content-type: application/json’ \
-d ‘{“query”:”agent security model”,”includeGraphRag”:true,”topK”:5}’ | jq
CLI: wunderland rag query “agent security model” --graph
When to use combined search:
- Questions requiring both textual similarity AND relational/structural context
- Queries about how entities relate to each other (e.g., “how does X affect Y”)
- When you want chunk-level evidence plus knowledge graph context in a single call
Debug Pipeline Tracing
Set debug: true in the query request (or use --debug CLI flag) to get a step-by-step trace of the retrieval pipeline. Each step reports timing and relevant metrics:
| Step | Data |
|---|---|
query_received | query text, preset, topK, vectorProvider, collectionIds |
variants_resolved | base query, variant count, variant texts |
vector_search | provider (sql/hnswlib/qdrant), candidate count, latency, embedding model |
keyword_search | enabled, match count, latency |
fusion | strategy (RRF), vector/keyword/merged counts |
graphrag | entities found, relationships, communities, search time |
pipeline_complete | total latency, results returned |
Enable globally via AGENTOS_RAG_DEBUG=true environment variable, or per-request with the debug flag.
CLI: wunderland rag query “security tiers” --debug
HNSW Vector Store Configuration
When using AGENTOS_RAG_VECTOR_PROVIDER=hnswlib, the following environment variables configure the HNSW index:
| Variable | Default | Description |
|---|---|---|
AGENTOS_RAG_HNSWLIB_M | 16 | Max number of connections per node (higher = better recall, more memory) |
AGENTOS_RAG_HNSWLIB_EF_CONSTRUCTION | 200 | Construction-time search depth (higher = better index quality, slower build) |
AGENTOS_RAG_HNSWLIB_EF_SEARCH | 100 | Query-time search depth (higher = better recall, slower query) |
AGENTOS_RAG_HNSWLIB_PERSIST_DIR | ./db_data/agentos_rag_hnswlib | Directory for persisted HNSW index files |
The health endpoint (/api/agentos/rag/health) reports the active vectorProvider and HNSW params when applicable.
Practical Guidance
- Default recommendation: start with vector (dense) retrieval, then add keyword (BM25/FTS) for recall, then add a reranker only where it’s worth the latency/cost.
- GraphRAG tends to pay off when questions depend on multi-hop relationships and “global summaries” (org structures, timelines, dependency graphs), not for everyday chat retrieval.
- Use
--debugto understand pipeline behavior and identify bottlenecks before tuning parameters.
Retrieval Strategies (Implemented)
RetrievalAugmentor.retrieveContext() supports RagRetrievalOptions.strategy:
similarity: Dense similarity search (bi-encoder) viaIVectorStore.query().hybrid: Dense + lexical fusion viaIVectorStore.hybridSearch()when the store implements it.SqlVectorStore.hybridSearch()performs BM25-style lexical scoring and fuses dense + lexical rankings (default: RRF).
mmr: Maximal Marginal Relevance diversification. The augmentor requests embeddings for candidates and then selects a diverse top-K set usingstrategyParams.mmrLambda(0..1).
Notes:
- If a store does not implement
hybridSearch(), AgentOS falls back to densequery(). - For
mmr, embeddings are used internally even ifincludeEmbeddings=false; embeddings are stripped from the output unless explicitly requested.
Reranking (Cross-Encoder)
If RetrievalAugmentorServiceConfig.rerankerServiceConfig is provided, AgentOS will:
- Initialize
RerankerService - Auto-register built-in reranker providers declared in config:
cohere(requiresapiKey)local(offline cross-encoder, requires installing Transformers.js:@huggingface/transformerspreferred, or@xenova/transformers)
Reranking is still opt-in per request via RagRetrievalOptions.rerankerConfig.enabled=true.
Multimodal RAG (Image + Audio)
AgentOS’ core RAG APIs are text-first. The recommended multimodal pattern is:
- Persist asset metadata (and optionally bytes).
- Derive a text representation (caption/transcript/OCR/etc).
- Index that text as a normal RAG document (so the same vector/BM25/rerank pipeline applies).
- Optionally add modality embeddings (image-to-image / audio-to-audio) as an acceleration path.
See MULTIMODAL_RAG.md for the reference implementation and HTTP API.