Adaptive Memory Router (Self-Calibrating)
Self-calibrating extension of Memory Router. Derives the routing table from a workload-specific calibration dataset instead of relying on the LongMemEval-S Phase B presets.
When to use this
Use MemoryRouter directly with a shipping preset (minimize-cost, balanced, maximize-accuracy) when your workload is similar to LongMemEval-S — conversational memory with the six standard categories.
Use AdaptiveMemoryRouter when:
- Your category distribution diverges from LongMemEval-S (e.g., heavy on temporal, light on multi-session).
- Your reader / judge / cost profile differs (different LLM, different judge rubric, different per-call cost).
- You want the router to optimize for YOUR per-category cost-accuracy points instead of a static table baked from someone else's measurement.
- You have a calibration dataset (or can collect one) — even ~50-200 samples per (category, backend) cell is enough.
How it works
The adaptive router takes a list of calibration samples and a preset rule, then derives a routing table:
interface CalibrationSample {
category: MemoryQueryCategory;
backend: MemoryBackendId;
costUsd: number;
correct: number; // 1 = correct, 0 = incorrect (or score in [0,1])
}
Three steps:
- Aggregate: roll samples up by
(category, backend)into mean cost + mean accuracy + sample count. - Per-category select: apply a preset rule per category to pick a backend.
- Build table: assemble the per-category picks into a frozen
RoutingTable. Categories with insufficient calibration fall back to the static preset's default.
Three preset rules:
| Rule | What it picks |
|---|---|
minimize-cost | cheapest backend within accuracyTolerance (default 2pp) of the best meanAccuracy on this category. If no backend is within tolerance, picks the best-accuracy backend (the gap exceeds tolerance, so accuracy gain justifies cost). |
balanced | best $/correct ratio (meanCost / meanAccuracy). Backends with zero meanAccuracy are skipped to avoid div-by-zero. |
maximize-accuracy | highest meanAccuracy backend; ties broken by lower meanCost. |
Quickstart
import {
AdaptiveMemoryRouter,
LLMMemoryClassifier,
FunctionMemoryDispatcher,
} from '@framers/agentos/memory-router';
// Collect calibration data from a Phase A sweep across your workload
// (sample queries dispatched to each candidate backend, scoring outcomes
// with your judge of choice).
const samples = [
{ category: 'multi-session', backend: 'canonical-hybrid', costUsd: 0.020, correct: 1 },
{ category: 'multi-session', backend: 'canonical-hybrid', costUsd: 0.020, correct: 0 },
// ... ~50-200 per (category, backend) cell
];
const router = new AdaptiveMemoryRouter({
classifier: new LLMMemoryClassifier({ llm: openaiAdapter }),
calibrationSamples: samples,
preset: 'minimize-cost',
minSamplesPerCell: 10, // require at least 10 samples to use calibration; below that fall back to static preset
accuracyTolerance: 0.02, // for minimize-cost: 2pp accuracy gap tolerated when picking the cheaper alternative
dispatcher: new FunctionMemoryDispatcher({
'canonical-hybrid': async (q, p) => myHybridRetrieve(q, p),
'observational-memory-v10': async (q, p) => myOmV10Recall(q, p),
'observational-memory-v11': async (q, p) => myOmV11Recall(q, p),
}),
});
// Inspect the derived table:
const derivedTable = router.getRoutingTable();
console.log(derivedTable.defaultMapping);
// {
// 'multi-session': 'observational-memory-v11', // calibration-driven
// 'single-session-user': 'canonical-hybrid', // calibration-driven
// 'single-session-assistant': 'canonical-hybrid', // preset fallback (no data)
// ...
// }
// Use it like any MemoryRouter:
const decision = await router.decide(query);
Calibration data collection
For each candidate backend, run your workload through it on a Phase A subset:
- Sample N queries from your workload (typically N ≈ 100-300 per category — stratified if some categories are rare).
- Dispatch each query to each candidate backend.
- Score each outcome with your judge.
- Emit one
CalibrationSampleper (query × backend × outcome) tuple.
Total spend: roughly N × 3 × per-backend-cost-per-query. For LongMemEval-S Phase A this was ~$30 per backend.
Pure functions exposed
For consumers who want to do calibration analysis outside the router:
import {
aggregateCalibration,
selectByPreset,
buildAdaptiveRoutingTable,
} from '@framers/agentos/memory-router';
// 1. Roll samples up.
const agg = aggregateCalibration(samples);
console.log(agg['multi-session']?.['canonical-hybrid']);
// { n: 156, meanCost: 0.0203, meanAccuracy: 0.547 }
// 2. Pick per category.
const backend = selectByPreset({
category: 'multi-session',
agg,
preset: 'balanced',
minSamplesPerCell: 10,
});
// 3. Build the full table directly.
const table = buildAdaptiveRoutingTable({
samples,
preset: 'maximize-accuracy',
minSamplesPerCell: 20,
accuracyTolerance: 0.025,
});
API surface
CalibrationSample,CalibrationCell,AggregatedCalibrationAdaptivePresetRule = 'minimize-cost' | 'balanced' | 'maximize-accuracy'aggregateCalibration— pure aggregatorselectByPreset— pure per-category selectorbuildAdaptiveRoutingTable— pure full-table constructorAdaptiveMemoryRouter— class extendingMemoryRouterwith calibration-derived tableAdaptiveMemoryRouterOptions,SelectByPresetArgs,BuildAdaptiveRoutingTableArgs
Related
- Memory Router — base primitive
- Cognitive Pipeline — composition that uses MemoryRouter as the recall stage
- Evaluation Framework — for collecting calibration data via your existing eval harness