System Architecture
AgentOS organizes the runtime around long-running agent state rather than around a single turn loop. Cross-session conversations, parallel agent instances with independent personality and memory, conditional tool execution, human-in-the-loop approval, and a memory layer that distinguishes verified user input from model-generated content are first-class subsystems with their own modules.
The 26 top-level modules documented below are predominantly state-management subsystems. The turn loop itself is one component among them, not the central abstraction.
This page is the system map. For the what of each subsystem — components, lifecycle ownership, source-tree location — read on. For deep-dives into individual concerns, follow the table of contents.
For specific subsystem deep-dives, see:
Each layer above corresponds to a section below. The mapping is one-to-one: layer 1 → API Surface Contract, layer 2 → GMI, layer 3 → Memory System, layer 4 → Tools, Skills, Extensions, layer 5 → Safety & Guardrails, layer 6 → Orchestration, layer 7 → Perception & Channels. The component pills inside each layer in the diagram are the same class and function names you'll see in the subsystem write-ups.
Source Directory Layout
The src/ tree is organized into 26 domain-specific top-level modules. Only foundational infrastructure remains under core/.
Perception model: Vision, hearing, and speech are separated into three independent modules following the biological perception analogy -- vision/ (OCR, scene detection, image analysis), hearing/ (STT providers, VAD, silence detection), and speech/ (TTS providers, resolver, session). Shared media generation (images, video, music, SFX) remains under media/.
Key architectural patterns:
-
GMI (Generalized Mind Instance) delegates to focused collaborators:
ConversationHistoryManager,CognitiveMemoryBridge,SentimentTracker, andMetapromptExecutor. Persona layering lives incognitive_substrate/persona_overlays/. Personas can be loaded from JSON (the legacyIPersonaDefinitionformat) or fromSOUL.mdworkspace directories viaSoulLoader(cognitive_substrate/personas/SoulLoader.ts) — both produce the same runtimeIPersonaDefinition. See SOUL_FILES.md for the per-agent identity convention. -
AgentOS is the public lifecycle facade. Setup and runtime concerns are in
api/runtime/(WorkflowFacade,CapabilityDiscoveryInitializer,RagMemoryInitializer). High-level helpers (generateText,streamText,agent,agency) live underapi/. -
AgentOSOrchestrator coordinates requests, delegating to
TurnExecutionPipeline(pre-LLM preparation),GMIChunkTransformer(stream mapping), andExternalToolResultHandler(tool-result continuation).
All paths below are under packages/agentos/src/.
| Module | Subdirs | Purpose |
|---|---|---|
agents/ | definitions/ · agency/ | Agent type definitions and multi-agent coordination (AgencyRegistry) |
api/ | runtime/ · types/ | Public API surface — AgentOS, generateText, streamText, agent, agency, orchestrator collaborators, provider defaults |
channels/ | adapters/ · telephony/ · social-posting/ | Platform adapters (Discord, Slack), voice-call providers (Twilio, Vonage), social-post management |
cognitive_substrate/ | personas/ · persona_overlays/ | The GMI itself plus ConversationHistoryManager, CognitiveMemoryBridge, SentimentTracker, MetapromptExecutor, and persona loaders (JSON + SOUL.md via SoulLoader) |
core/ | config/ · conversation/ · embeddings/ · llm/ · logging/ · rate-limiting/ · storage/ · streaming/ · tools/ · utils/ · vector-store/ | Foundational infrastructure: shared interfaces, the IStorageAdapter, the StreamingManager, the ITool / ToolOrchestrator, embedding and vector-store abstractions |
discovery/ | — | Capability-discovery engine (tiered semantic search) |
emergent/ | — | Runtime tool forging and self-improvement (forge_tool, EmergentCapabilityEngine, EmergentJudge) |
evaluation/ | observability/ | Eval framework + OpenTelemetry tracing and metrics |
extensions/ | — | Extension system: ExtensionPack, descriptor kinds, activation lifecycle |
hearing/ | — | Listening surface: STT providers, VAD, silence detection |
marketplace/ | store/ · workspace/ | Agent-marketplace listings + per-agent workspace helpers |
media/ | audio/ · images/ · video/ | Creative generation: image (DALL-E, Stability), video, music, SFX |
memory/ | core/ · io/facade/ · io/tools/ · mechanisms/ · pipeline/ · retrieval/ | Cognitive memory system: encoding/decay, the Memory API, memory tools, neuroscience-grounded mechanisms, consolidation, retrieval brain |
nlp/ | ai_utilities/ · language/ · tokenizers/ · stemmers/ · normalizers · lemmatizers · filters | NLP processing — LLM-backed summarization, language detection, tokenizers |
orchestration/ | planner/ · hitl/ · workflows/ · turn-planner/ · ir/ · compiler/ · runtime/ · checkpoint/ · events/ | DAG workflow engine, PlanningEngine (ReAct loops), human-in-the-loop, IR/compiler, event bus |
provenance/ | — | Content provenance + blockchain anchoring |
query-router/ | — | Query classification + routing |
rag/ | vector-search/ · vector_stores/ · chunking/ · reranking/ · unified/ · graphrag/ | Retrieval-augmented generation: HNSW sidecar, vector-store implementations, chunking strategies, reranking, graph-augmented retrieval |
safety/ | guardrails/ · runtime/ | Guardrails (IGuardrailService, ParallelGuardrailDispatcher) and runtime safety (CircuitBreaker, CostGuard, StuckDetector) |
sandbox/ | executor/ · subprocess/ | Sandboxed code execution (node:vm) and CLISubprocessBridge / CLIRegistry |
skills/ | — | SKILL.md loader (content lives in agentos-skills) |
speech/ | — | Speaking surface: TTS providers, resolver, session |
structured/ | output/ · prompting/ | Structured output (StructuredOutputManager, JSON schema) + prompt routing |
types/ | — | Shared types (auth) |
vision/ | — | Seeing surface: OCR, scene detection, image analysis |
voice-pipeline/ | — | Real-time voice-conversation orchestrator |
Architecture Layers
The diagram at the top of this page is the canonical layered view. From top to bottom:
- API surface —
generateText/streamText/agent/agency/generateImage, plus theAgentOSlifecycle facade. - Orchestration — DAG runtime,
workflow(),mission(),AgentGraph, HITL, checkpointing, planning engine. - GMI — per-mind state:
ConversationHistory,CognitiveMemoryBridge,SentimentTracker,MetapromptExecutor, persona overlays. - Safety & Guardrails alongside Tools & Extensions — 5-tier security (PII, toxicity, grounding, circuit breakers, cost guard) and the 110-extension / 88-skill catalog with capability discovery and runtime tool forging.
- Memory & RAG — 4-tier cognitive memory, 8 mechanisms (Ebbinghaus decay, retrieval-induced forgetting, …), 7 vector backends, HyDE, GraphRAG, hybrid retrieval,
CitationVerifier. - LLM providers — 11 direct providers + OpenRouter fan-out with automatic fallback chains.
- Perception & channels — vision (OCR), hearing (STT, VAD), speech (TTS, voice pipeline), 12 messaging adapters, telephony.
The diagram above the prose shows how a typical request enters at layer 1 and traverses downward.
API Surface Contract
generateText(), streamText(), agent(), agency(), and the AgentOS runtime share some configuration names, but the shared config surface does not imply identical enforcement.
agent()is the lightweight stateful facade for prompt assembly, sessions, tools, hooks, personality shaping, and usage-ledger forwarding.generateText()/streamText()are low-level helper loops for provider selection, direct tool execution, and text-fallback tool calling.- The full
AgentOSruntime andagency()own the deeper runtime systems: emergent tooling, guardrails, discovery, RAG bootstrapping, permissions/security tiers, HITL, voice/channels, and provenance-aware orchestration.
GMI (Generalized Mind Instance)
GMI is what an agent actually is between turns: persona, working memory, mood, reasoning trace, conversation history. Each instance is a single mind bound to one persona. The dedicated GMI page walks the seven-ring concentric model in detail — this section covers how the GMI plugs into the wider runtime.
GMI Lifecycle
Initialization
GMI.initialize(persona, config) validates required dependencies, wires collaborators, and loads state:
const gmi = new GMI('my-gmi-id');
await gmi.initialize(researchAssistantPersona, {
workingMemory,
promptEngine,
toolOrchestrator,
llmProviderManager,
utilityAI,
cognitiveMemory, // Optional: enables CognitiveMemoryBridge
retrievalAugmentor, // Optional: enables RAG
});
Required dependencies: workingMemory, promptEngine, toolOrchestrator, llmProviderManager, utilityAI. Optional: cognitiveMemory, retrievalAugmentor.
Collaborators
The GMI delegates to four extracted collaborators to keep the core class focused:
| Collaborator | Responsibility |
|---|---|
ConversationHistoryManager | Maintains chat history, supports hydration from external stores |
CognitiveMemoryBridge | Bridges GMI turns to the CognitiveMemoryManager (encode/retrieve/observe) |
SentimentTracker | Tracks user sentiment via IUtilityAI, emits GMIEvent types (frustration, confusion, etc.) |
MetapromptExecutor | Handles metaprompt triggers, self-reflection, and state updates |
Turn Processing
processTurnStream() is an async generator that yields GMIOutputChunk objects:
for await (const chunk of gmi.processTurnStream(turnInput)) {
switch (chunk.type) {
case GMIOutputChunkType.TEXT_DELTA: // Streaming text
case GMIOutputChunkType.TOOL_CALL: // Tool call request
case GMIOutputChunkType.TOOL_RESULT: // Tool execution result
case GMIOutputChunkType.FINAL_RESPONSE: // Aggregated final output
case GMIOutputChunkType.ERROR: // Error during processing
}
}
AgentOS Facade
AgentOS (api/AgentOS.ts) is the public-facing facade that manages GMI instances, streaming, and cross-cutting concerns. It exposes processRequest() as the primary entry point and coordinates:
GMIManager-- Pool of GMI instances keyed by persona/sessionAgentOSOrchestrator-- Turn preparation and stream transformationStreamingManager-- WebSocket/SSE stream multiplexingExtensionManager-- Tool, guardrail, and workflow extension loadingConversationManager-- Cross-session conversation persistence
AgentOSConfig is the comprehensive configuration object (~50 fields) that wires all subsystems together. Key optional features activated via config: ragConfig, turnPlanning, emergent, observability, standaloneMemory, workflowEngineConfig.
Request Lifecycle
A user request flows through the following stages:
- Authentication & Rate Limiting -- Validate auth context and check rate limits.
- Context Assembly -- Load session history, conversation context, and temporal/environmental state.
- GMI Selection -- Get or create a GMI instance for the user/persona/session tuple.
- Memory Retrieval --
CognitiveMemoryBridgeretrieves relevant memory traces; RAG retrieval runs if configured. - Prompt Construction --
MetapromptExecutorassembles system, persona, memory, RAG context, and conversation history into the prompt viaPromptBuilder. - Pre-execution Guardrails --
ParallelGuardrailDispatcherruns input guardrails (sanitizers first, classifiers in parallel). - Tool Orchestration --
ToolOrchestratorresolves and executes any tool calls selected by the LLM. - LLM Execution --
StreamingManagersends the prompt to the selected LLM provider and streams chunks. - Post-execution Guardrails -- Output guardrails evaluate the response (toxicity, PII, grounding).
- Memory Update --
CognitiveMemoryBridgeencodes new memory traces;MemoryObserverqueues background consolidation. - Analytics --
Tracerrecords OpenTelemetry spans; cost/token metrics are tracked.
The TurnExecutionPipeline (in api/runtime/) handles steps 2-6 before handing off to the LLM. GMIChunkTransformer maps raw LLM chunks into AgentOSResponse format. ExternalToolResultHandler manages tool-result continuation loops.
Sequence Diagram
The following sequence diagram traces a single request through the system:
Key Types
| Type | Module | Purpose |
|---|---|---|
AgentOSInput | api/types/ | Normalized request envelope (text, audio, images, metadata) |
AgentOSResponse | api/types/ | Streamed response chunks (TEXT_DELTA, TOOL_CALL, FINAL_RESPONSE, ERROR) |
GMITurnInput | cognitive_substrate/IGMI | Internal turn representation consumed by the GMI |
GMIOutputChunk | cognitive_substrate/IGMI | Per-chunk output from the cognitive engine |
ConversationContext | core/conversation/ | Session state: history, active persona, user context |
Extension & Guardrail Runtime
The extension runtime is centered on three core pieces:
ExtensionManifest/ExtensionPack-- Declarative loading of tool bundles, guardrails, and channel adapters.ExtensionManager-- Descriptor activation and runtime access.ISharedServiceRegistry-- Lazy singleton reuse across packs (for NLP pipelines, ONNX classifiers, embedding functions).
interface ExtensionPack {
name: string;
version?: string;
descriptors: ExtensionDescriptor[];
onActivate?: (context: ExtensionLifecycleContext) => Promise<void> | void;
onDeactivate?: (context: ExtensionLifecycleContext) => Promise<void> | void;
}
Creating an Extension Pack
Extension packs are the unit of distribution. Each pack bundles one or more descriptors of the same or different kinds (tool, guardrail, workflow, provenance, etc.) and can hook into the activation lifecycle to perform setup and teardown.
import type { ExtensionPack, ExtensionLifecycleContext } from '@framers/agentos/extensions';
import { EXTENSION_KIND_TOOL } from '@framers/agentos/extensions';
export function createMyExtensionPack(): ExtensionPack {
return {
name: 'my-custom-tools',
version: '1.0.0',
descriptors: [
{
kind: EXTENSION_KIND_TOOL,
tool: {
id: 'my-search-tool',
name: 'search_documents',
displayName: 'Document Search',
description: 'Search internal documents by query.',
inputSchema: {
type: 'object',
properties: { query: { type: 'string' } },
required: ['query'],
},
execute: async (args) => {
const results = await searchIndex(args.query);
return { success: true, output: results };
},
},
},
],
onActivate: async (ctx: ExtensionLifecycleContext) => {
const apiKey = ctx.getSecret?.('MY_API_KEY');
// Initialize resources, warm caches, etc.
},
onDeactivate: async () => {
// Release resources
},
};
}
Packs are loaded by including them in the extensionManifest passed to AgentOS.initialize(), or by using the schema-on-demand meta-tools (extensions_list, extensions_enable) at runtime.
Descriptor Kinds
| Kind | Constant | Payload Field | Description |
|---|---|---|---|
tool | EXTENSION_KIND_TOOL | tool: ITool | Callable tool registered in ToolOrchestrator |
guardrail | EXTENSION_KIND_GUARDRAIL | guardrail: IGuardrailService | Input/output guardrail |
workflow | EXTENSION_KIND_WORKFLOW | workflow: WorkflowDescriptorPayload | Reusable workflow definition |
provenance | EXTENSION_KIND_PROVENANCE | provenance: IProvenanceProvider | Content anchoring provider |
Guardrail Dispatch Model
ParallelGuardrailDispatcher uses a two-phase execution model:
- Phase 1 (sequential sanitizers) -- Guardrails with
config.canSanitize === truerun in registration order and can chainSANITIZEresults deterministically. ABLOCKin Phase 1 short-circuits the entire pipeline. - Phase 2 (parallel classifiers) -- All remaining guardrails run concurrently via
Promise.allSettled. A Phase 2SANITIZEis downgraded toFLAGbecause concurrent sanitization would produce non-deterministic results.
The final outcome uses worst-wins aggregation: BLOCK (3) > FLAG (2) > ALLOW (0).
GuardrailOutputPayload carries ragSources?: RagRetrievedChunk[] so grounding-aware guardrails can verify claims against retrieved evidence.
Each guardrail service can also configure timeouts via config.timeoutMs. If a guardrail exceeds its timeout or throws, it fails open (returns null) rather than blocking the pipeline.
Built-in Guardrail Packs
Six built-in packs ship from packages/agentos-extensions/registry/curated/safety/:
pii-redaction— sanitizer; redacts personally identifiable information before tokens leave the runtimeml-classifiers— toxicity / hate-speech / harm classification via on-device ONNX modelstopicality— LLM-as-judge classifier that rejects off-topic / out-of-scope promptscode-safety— static + heuristic detection of dangerous code patterns in agent-emitted snippetsgrounding-guard— verifies output claims against retrieved RAG sources (citation faithfulness)content-policy-rewriter— sanitizer; rewrites policy-violating output in-place rather than blocking
For details on writing custom guardrails, see Creating Guardrails and Guardrails Usage.
Persona System
Personas define the identity, expertise, and behavioral configuration for a GMI instance.
Key files:
cognitive_substrate/personas/IPersonaDefinition.ts-- TheIPersonaDefinitioninterfacecognitive_substrate/personas/PersonaLoader.ts-- Loads persona JSON files from disk or registrycognitive_substrate/personas/PersonaValidation.ts-- Schema validationcognitive_substrate/persona_overlays/PersonaOverlayManager.ts-- Runtime persona layering
A persona definition includes:
- Identity -- Name, role, title, personality traits, expertise domains, purpose/objectives
- Cognitive config -- Memory settings (working memory capacity, decay rate, consolidation frequency), attention priorities
- Behavioral config -- Communication style, problem-solving methodology, collaboration style
- HEXACO personality traits -- Six-factor personality model that modulates memory encoding, retrieval, and cognitive mechanisms
HEXACO Trait Modulation
The HEXACO model provides six orthogonal personality dimensions. Each trait modulates specific cognitive subsystems:
| HEXACO Trait | Range | Cognitive Effect |
|---|---|---|
| Honesty-Humility | 0-1 | Source confidence skepticism. High H penalizes unverified claims. |
| Emotionality | 0-1 | Emotional drift in memory encoding. High E amplifies flashbulb memories. |
| Extraversion | 0-1 | Feeling-of-knowing threshold. High X lowers the threshold to share uncertain knowledge. |
| Agreeableness | 0-1 | Emotion regulation strategy. High A favors cooperative/supportive responses. |
| Conscientiousness | 0-1 | Retrieval-induced forgetting strength. High C enables stronger competitive suppression. |
| Openness | 0-1 | Involuntary recall sensitivity and novelty attention. High O increases creative associations. |
Persona Definition Example
const researchAssistant: IPersonaDefinition = {
id: 'research-assistant',
name: 'Research Assistant',
role: 'Academic research aide',
systemPrompt: 'You are a meticulous research assistant...',
strengths: ['literature review', 'data analysis', 'citation management'],
hexaco: {
honestyHumility: 0.9, // High source skepticism
emotionality: 0.3, // Low emotional bias
extraversion: 0.5, // Moderate sharing threshold
agreeableness: 0.7, // Cooperative communication
conscientiousness: 0.9, // Strong retrieval filtering
openness: 0.8, // High novelty attention
},
memoryConfig: {
workingMemoryCapacity: 9,
consolidationFrequencyMinutes: 15,
ragConfig: {
retrievalTriggers: { onUserQuery: true },
},
},
moodAdaptation: { enabled: true, defaultMood: 'NEUTRAL', sensitivityFactor: 0.3 },
defaultModelId: 'gpt-4o',
defaultProviderId: 'openai',
};
The PersonaOverlayManager supports runtime persona blending -- applying temporary overlays (e.g., "be more formal") on top of the base persona definition without mutating the original.
For preset persona definitions, see packages/wunderland/presets/.
Prompt Construction
MetapromptExecutor (cognitive_substrate/MetapromptExecutor.ts) is the prompt assembly engine. It builds the final LLM prompt from several components and supports three trigger types for metaprompt execution: turn_interval (periodic self-reflection), event_based (driven by SentimentTracker events like frustration or confusion), and manual (flags in working memory).
Prompt Assembly Order
The prompt is assembled in a specific order, with each section receiving a token budget allocation:
Token Budget Strategy
ConversationHistoryManager supports three overflow strategies when conversation history exceeds the allocated token budget:
truncate-- Drop oldest messages first (lowest latency, no LLM call)summarize-- UseIUtilityAI.summarize()to compress older history into a summary block (triggered atsummarizationTriggerTokens)hybrid-- Keep recent messages verbatim, summarize older ones (best quality/cost tradeoff)
The total token budget is derived from the model's context window minus reserves for system prompt and output tokens. PromptProfileRouter (structured/prompting/PromptProfileRouter.ts) can adjust the budget split based on task classification (e.g., RAG-heavy tasks get more retrieval budget).
Built-in Metaprompt Handlers
MetapromptExecutor includes pre-built handlers for common situations:
- Frustration recovery -- Triggered by negative sentiment events
- Confusion clarification -- When the user signals misunderstanding
- Satisfaction reinforcement -- When the user is pleased
- Error recovery -- After tool failures
- Engagement boost -- When the conversation stalls
- Trait adjustment -- Periodic self-reflection that adjusts persona parameters within bounds
See Adaptive Prompt Intelligence for the full guide: the three trigger types, the five preset templates, the state surfaces metaprompts mutate, and concrete cost numbers.
Memory System
The cognitive memory system replaces flat key-value memory with a personality-modulated, decay-aware architecture grounded in cognitive science.
Core Model
Memory traces follow the Ebbinghaus forgetting curve:
S(t) = S0 * e^(-dt / stability)
where S0 (initial encoding strength) is set by personality traits, emotional arousal, and content features. The stability time constant grows with each successful retrieval via the desirable difficulty effect -- memories that were harder to retrieve (lower current strength at retrieval time) receive a larger stability boost.
From memory/core/decay/DecayModel.ts:
// Ebbinghaus forgetting curve
function computeCurrentStrength(trace: MemoryTrace, now: number): number {
const elapsed = Math.max(0, now - trace.lastAccessedAt);
return trace.encodingStrength * Math.exp(-elapsed / trace.stability);
}
Traces below a configurable pruning threshold are soft-deleted (isActive = false) during consolidation.