AgentOS
AgentOS is an open-source TypeScript runtime for AI agents that remember, adapt, and write their own tools. Apache-2.0.
npm install @framers/agentos
The runtime carries the parts of an agent that should outlive a single chat completion. Persistent cognitive memory with eight neuroscience-grounded mechanisms (Ebbinghaus decay, retrieval-induced forgetting, reconsolidation, source-confidence decay, schema encoding, temporal gist, metacognitive feeling-of-knowing, involuntary recall) — each grounded in primary cognitive-science literature, each opt-in. Optional HEXACO personality vectors that modulate encoding strength, working-memory capacity, and prompt formatting per trait. Six multi-agent orchestration strategies (sequential, parallel, debate, graph, hierarchical, adaptive). Two-phase streaming guardrails. A voice pipeline with VAD, barge-in, and streaming STT/TTS. One dispatch interface across 21 LLM providers.
Agents with emergent: true can write tools mid-decision. When the runtime encounters a sub-task no existing capability covers, it generates a TypeScript function with a Zod schema, routes it through a separate LLM-as-judge that scores code safety, test correctness, and determinism, and on approval executes it in a hardened node:vm sandbox. The forged tool joins the session catalog; promotion to a SKILL.md makes it persist across processes. Multi-agent teams that hit a capability gap call spawn_specialist and the judge reviews the synthesized agent spec before the specialist joins the live roster.
100+ first-party extensions (channel adapters, tool packs, guardrail packs) and 88 curated SKILL.md skills auto-discover at startup through their respective registries — no manual registration. The same surface that auto-loaded skills join is the surface runtime-forged tools graduate into.
The benchmarks below measure this runtime against alternative memory libraries at the same gpt-4o answer model.
85.6% on LongMemEval-S at $0.0090 per correct, 0.4 points behind Emergence.ai's closed-source SaaS at 86% and +1.4 points above Mastra Observational Memory (84.23%) at matched gpt-4o reader. AgentOS ships under Apache-2.0, free to install, fork, and self-host.
70.2% on LongMemEval-M at $0.0078 per correct on the 1.5M-token / 500-session haystack — the only open-source library on the public record above 65% on M with publicly reproducible methodology. Competitive with the strongest published M results in the LongMemEval paper (Wu et al., ICLR 2025: round Top-5 65.7%, session Top-5 71.4%, round Top-10 72.0%).
Benchmarks reference · Reproducible run JSONs · SOTA writeup
How recall works
AgentOS gates memory through three independent LLM-as-judge classifiers that share one classification pass. Trivial queries — greetings, small talk, general knowledge answerable from context — skip retrieval entirely. Queries that need memory get the retrieval architecture best-suited to the category. The right reader model handles each question type.
| Stage | Primitive | Decision per query |
|---|---|---|
| 1 | QueryClassifier | T0/none · T1/simple · T2/moderate · T3/complex |
| 2 | MemoryRouter | canonical-hybrid · observational-memory-v10 · v11 |
| 3 | ReaderRouter | gpt-4o vs gpt-5-mini per category |
The pipeline costs one classifier call per query — Stages 2 and 3 reuse Stage 1's classification. That single call buys 12× lower reader cost on most categories, +10 points on single-session-preference, and a clean abstain path for queries that don't need memory at all. Reproducible run JSONs in agentos-bench.
Where to start
- Cognitive Memory — why memory should forget. Eight neuroscience-grounded mechanisms, primary-source citations, the consolidation loop. The story is the page.
- GMI architecture — what an agent actually is between turns. Seven layers around an LLM core.
- System Architecture — how the 26 modules compose into a runtime.
- Deep Research — the 3-phase pipeline behind sourced answers.
- Emergent Capabilities — runtime tool forging, judge approval, sandboxed execution.
- Examples Cookbook — 18 runnable examples covering agents, agencies, voice, orchestration.
- TypeDoc API — every class, interface, function in the runtime.
Paracosm — the swarm-simulation companion
Paracosm is an agent-swarm simulation engine I built on AgentOS. Define a world as JSON, run it with HEXACO-typed leaders directing a swarm of specialists and ~100 personality-typed cells, and watch their decisions diverge into measurably different outcomes from an identical seed. Reproducible, forkable, replayable. The swarm is first-class on the API: RunArtifact.finalSwarm, paracosm/swarm helpers, GET /api/v1/runs/:runId/swarm for HTTP consumers.
The reference scenario ships as Mars Genesis — a 100-colonist Mars settlement running from 2035 to 2083 across six turns. Two leaders, same seed, different HEXACO profiles, different futures. Try it live.
Live demo · GitHub · npm · API reference
Wilds AI Discord for questions, feedback, community. Contact AgentOS for partnerships, security disclosures, enterprise inquiries.