Skip to main content

Memory Router (Recall-Stage Smart Orchestration)

Stage 2 of the Classifier-Driven Memory Pipeline. Recall-stage smart orchestrator. Picks the best memory-recall architecture per query, classifier-driven dispatch across canonical-hybrid, observational-memory-v10, and observational-memory-v11 backends. Sibling primitives: Query Classifier (Stage 1, the memory-or-not gate), Reader Router (Stage 3, the reader-tier dispatch), Ingest Router (input stage).

The 2026-04-28 v1 publication's validated deployed-config headline pairs MemoryRouter (Stage 2) with the ReaderRouter (Stage 3) and the text-embedding-3-small embedder: 85.6% [82.4%, 88.6%] at $0.0090/correct, 4 second avg latency on LongMemEval-S Phase B N=500. Beats Mastra OM gpt-4o (84.2% published) on accuracy. Beats EmergenceMem Simple Fast (80.6% measured apples-to-apples in our harness) by +5.0 pp accuracy at 6.5× lower cost-per-correct.

The MemoryRouter primitive itself ships three calibrated presets (maximize-accuracy, balanced, minimize-cost). The earlier shipping headline at 76.6% [72.8, 80.2] / $0.058/correct ran against CharHashEmbedder (the bench's "no embedder configured" fallback). The +9 pp lift to the current 85.6% number came from (1) wiring text-embedding-3-small as the embedder, the documented production path, and (2) dropping the minimize-cost preset's MS+SSP → OM-v11 routing in favor of canonical-hybrid for all categories paired with ReaderRouter (see Tier 3 minimize-cost staleness for sem-embed deployments below).

What it actually does

Every memory-recall query goes through three steps:

  1. A gpt-5-mini-style classifier reads the query and emits a MemoryQueryCategory (one of six: single-session-user, single-session-assistant, single-session-preference, knowledge-update, multi-session, temporal-reasoning).
  2. The pure selectBackend function maps that category to a backend choice using the configured routing table (one of three shipping presets, or your own).
  3. An optional dispatcher executes the backend against your Memory instance.

The classifier call is ~$0.0002 per query. The routing decision saves dollars by picking canonical-hybrid (cheap, accurate on most categories) instead of paying the OM premium on every query, while still routing multi-session synthesis questions to the OM backends where the architectural lift earns the cost.

Why route at all

Per-category Phase B N=500 measurements show different memory architectures dominate different categories:

Categorycanonical-hybridOM-v10OM-v11
single-session-user97.1% / $0.01997.1% / $0.02198.6% / $0.021
single-session-assistant89.3% / $0.01883.9% / $0.02083.9% / $0.019
single-session-preference60.0% / $0.02160.0% / $0.02163.3% / $0.021
knowledge-update86.8% / $0.01985.9% / $0.03187.2% / $0.031
multi-session54.9% / $0.02060.2% / $0.03161.7% / $0.034
temporal-reasoning70.2% / $0.02071.0% / $0.02169.2% / $0.021

Numbers above are accuracy / per-call USD. The flat "always canonical" pipeline costs accuracy on multi-session (-6.8pp). The flat "always OM-v11" pipeline costs accuracy on single-session-assistant (-5.4pp) and pays a 1.7-1.8x cost premium on every other category. Per-query routing extracts the best of both.

Six query categories

type MemoryQueryCategory =
| 'single-session-user'
| 'single-session-assistant'
| 'single-session-preference'
| 'knowledge-update'
| 'multi-session'
| 'temporal-reasoning';

The taxonomy is calibrated from LongMemEval-S. Each category captures a distinct memory-recall pattern; the classifier is trained to discriminate between them via a discriminator prompt (with optional few-shot variant for harder cases like SSU-vs-SSA confusion).

Three backend identifiers

type MemoryBackendId =
| 'canonical-hybrid' // BM25 + dense + Cohere rerank-v3.5
| 'observational-memory-v10' // synthesized observation log + dynamic OM router
| 'observational-memory-v11'; // v10 + conditional verbatim citation rule

Backend execution itself lives in the dispatcher (consumer-supplied). MemoryRouter only DECIDES; it doesn't execute. This split lets you wire the dispatcher to your existing HybridRetriever / OM pipeline / custom retriever without touching this module.

Three shipping presets

PresetStrategyPhase B ResultWhen to use
minimize-cost (default)Cheapest Pareto-dominant per category. Pay OM premium only on MS + SSP.76.6% [72.8, 80.2] at $0.0580/correct, 16s avgCost-sensitive workloads. The shipping default.
balancedTrade 1.6x cost for 10x latency wins on KU/TR74.5% / $0.205/correct (sim)Interactive UX where latency matters
maximize-accuracyHighest-accuracy backend per category75.6% [71.8, 79.2] at $0.2434/correct, 66s avgAccuracy-sensitive with moderate cost tolerance

Quickstart

import {
LLMMemoryClassifier,
MemoryRouter,
FunctionMemoryDispatcher,
} from '@framers/agentos/memory-router';
import type { ScoredTrace } from '@framers/agentos/memory';

const router = new MemoryRouter({
classifier: new LLMMemoryClassifier({ llm: openaiAdapter }),
preset: 'minimize-cost',
budget: { perQueryUsd: 0.05, mode: 'cheapest-fallback' },
dispatcher: new FunctionMemoryDispatcher<ScoredTrace, { topK: number }>({
'canonical-hybrid': async (q, { topK }) =>
memory.recall(q, { limit: topK }),
'observational-memory-v10': async (q, { topK }) =>
omV10.recall(q, { limit: topK }),
'observational-memory-v11': async (q, { topK }) =>
omV11.recall(q, { limit: topK }),
}),
});

const { decision, traces, backend } = await router.decideAndDispatch(
query,
{ topK: 10 },
);
console.log(decision.classifier.category); // 'multi-session'
console.log(backend); // 'observational-memory-v11'
console.log(decision.routing.estimatedCostUsd); // 0.0336
console.log(decision.routing.chosenBackendReason); // 'routing-table pick fits budget'

Decision-only flow

If you'd rather execute the backend yourself, use decide():

const { classifier, routing } = await router.decide(query);

if (routing.chosenBackend === 'canonical-hybrid') {
const traces = await memory.recall(query, { limit: 10 });
// your custom logic
}

Budget-aware dispatch

const router = new MemoryRouter({
classifier,
preset: 'maximize-accuracy',
budget: {
perQueryUsd: 0.025,
mode: 'cheapest-fallback',
},
});

Three modes:

  • hard: throw MemoryRouterBudgetExceededError when the routing-table pick exceeds the ceiling. Production code catches and escalates.
  • soft: keep the picked backend when it has better $/correct than the cheapest backend that fits, even if it exceeds the budget. Prefers accuracy-economical overruns.
  • cheapest-fallback (default): silently downgrade to the cheapest backend that fits. If no backend fits, pick the globally cheapest and flag budgetExceeded: true in the decision.

Custom routing table or per-category override

const router = new MemoryRouter({
classifier,
preset: 'balanced',
routingTable: {
preset: 'balanced',
defaultMapping: {
'single-session-assistant': 'canonical-hybrid',
'single-session-user': 'canonical-hybrid',
'single-session-preference': 'canonical-hybrid',
'knowledge-update': 'canonical-hybrid',
'multi-session': 'canonical-hybrid', // override: skip OM premium
'temporal-reasoning': 'canonical-hybrid',
},
},
});

// Or patch a single category:
const router2 = new MemoryRouter({
classifier,
preset: 'maximize-accuracy',
mapping: {
'single-session-preference': 'canonical-hybrid',
},
});

Few-shot classifier prompt

For deployments where SSU-vs-SSA, SSP-vs-SSA, MS-vs-KU confusion costs accuracy, use the few-shot variant:

const router = new MemoryRouter({
classifier,
preset: 'minimize-cost',
useFewShotPrompt: true,
});

// or per-call
await router.decide(query, { useFewShotPrompt: true });

API surface

  • MemoryQueryCategory, MemoryBackendId, MemoryRouterPreset, RoutingTable
  • MEMORY_QUERY_CATEGORIES — the six-category tuple
  • MINIMIZE_COST_TABLE, BALANCED_TABLE, MAXIMIZE_ACCURACY_TABLE, PRESET_TABLES
  • MemoryBackendCostPoint, DEFAULT_MEMORY_BACKEND_COSTS, TIER_1_CANONICAL_COSTS, TIER_2A_V10_COSTS, TIER_2B_V11_COSTS
  • selectBackend (pure function)
  • MemoryRoutingDecision, MemoryRouterConfig, MemoryBudgetMode
  • IMemoryClassifier, IMemoryClassifierLLM, LLMMemoryClassifier
  • CLASSIFIER_SYSTEM_PROMPT, CLASSIFIER_SYSTEM_PROMPT_FEWSHOT, SAFE_FALLBACK_CATEGORY
  • IMemoryDispatcher, FunctionMemoryDispatcher
  • MemoryRouter, MemoryRouterOptions, MemoryRouterDecideOptions, MemoryRouterDecision, MemoryRouterDispatchedDecision
  • Errors: MemoryRouterUnknownCategoryError, MemoryRouterBudgetExceededError, MemoryRouterDispatcherMissingError, UnsupportedMemoryBackendError

Methodology + numbers

The shipping cost-points in DEFAULT_MEMORY_BACKEND_COSTS come from LongMemEval-S Phase B N=500 run JSONs in packages/agentos-bench/results/runs/. Each entry's per-category accuracy/cost/latency is from a real benchmark sweep at gpt-4o reader, gpt-4o-2024-08-06 judge, rubricVersion 2026-04-18.1, seed=42, with bootstrap 95% CIs and a published 1% [0%, 3%] judge false-positive rate.

For workloads whose cost/accuracy profile diverges from LongMemEval-S, see Adaptive Memory Router — derives the routing table from your own calibration data instead of relying on Phase B presets.

Tier 3 minimize-cost staleness for sem-embed deployments

The minimize-cost preset's routing table sends multi-session and single-session-preference cases to the observational-memory-v11 backend. That table was calibrated on Phase B data measured against CharHashEmbedder (recall@10 around 0.62 on canonical-hybrid). With text-embedding-3-small the canonical-hybrid recall@10 lifts to 0.981, and the per-category accuracy story changes:

At gpt-4o reader, dropping the OM-v11 routing produces a +1.0 pp aggregate lift (SSP gains 13.4 pp on canonical, MS loses 4 pp, case-weighted aggregate favors canonical). At gpt-5-mini reader (via ReaderRouter), OM-v11 routing for MS/SSP is statistically tied with canonical, but OM-v11 imposes a 60-120 second observer pipeline per OM-routed case (p95 latency 111 sec with the routing on, 7 sec with it off, a 15× tail-latency reduction by dropping it).

For new sem-embed deployments, the recommended config is canonical-hybrid for all categories + ReaderRouter per-category reader-tier dispatch + text-embedding-3-small embedder. This is the validated 85.6% headline. The minimize-cost preset's table will be re-derived from sem-embed Phase B data in v2.

Existing CharHash-era deployments using minimize-cost continue to work (no breaking change in the API), but the 76.6% headline they validate against is the older bench-default-fallback number. Migrating to sem-embed embedder + dropping the policy-router preset (using canonical-hybrid directly) + adding ReaderRouter is a +9 pp accuracy lift at lower cost and faster latency.