Paracosm — AI Simulation Engine
Paracosm is a structured world model for AI agents, built on AgentOS. Start from a prompt, brief, URL, or scenario JSON draft; compile or ground it into a typed ScenarioPackage; pick leaders with different HEXACO personality profiles; and watch their decisions diverge into measurably different trajectories from an identical seed. The reference scenario ships as Mars Genesis: a 100-colonist Mars settlement running from 2035 to 2083 across six turns.
Where paracosm sits in the world-model landscape
Paracosm is a structured world model in the sense of Xing 2025 and the ACM CSUR 2025 world-model survey, and a counterfactual world simulation model in the sense of Kirfel et al, 2025. It is not a generative visual or spatial world model (Sora, Genie 3, World Labs Marble), not a JEPA-style predictive-representation model (LeCun's AMI Labs), not a multi-agent task orchestration framework (LangGraph, AutoGen, CrewAI, OpenAI Agents SDK), not a bottom-up swarm intelligence simulator (MiroFish, OASIS), and not a generative-agents library (Stanford Generative Agents, Google DeepMind Concordia). It is a prompt/document/URL-grounded, JSON-contract-backed state space + deterministic seeded kernel + LLM-driven events and specialist analyses + HEXACO-personality leaders + universal Zod-validated run artifact spanning turn-loop civilization simulations, batch-trajectory digital twins, and batch-point forecasts.
The important boundary: JSON is the canonical contract, not the product boundary. Today compileScenario() takes a scenario JSON draft plus optional seedText or seedUrl grounding. The next wrapper should take one prompt or document, ask an LLM to propose the same scenario contract, validate it, then compile and run it.
Full taxonomy mapping lives at docs/positioning/world-model-mapping.md.
Live demo · GitHub · npm · API reference · Positioning map · Case study blog post
Quick Start
npm install paracosm
import { marsScenario } from 'paracosm/mars';
import { runSimulation } from 'paracosm/runtime';
const aria = {
name: 'Aria Chen',
archetype: 'The Visionary',
unit: 'Colony Alpha',
hexaco: {
openness: 0.95, conscientiousness: 0.35, extraversion: 0.85,
agreeableness: 0.55, emotionality: 0.30, honestyHumility: 0.65,
},
instructions: '',
};
const result = await runSimulation(aria, [], {
scenario: marsScenario,
maxTurns: 6,
seed: 950,
onEvent: e => console.log(e.type, e.data?.title),
});
console.log(result.finalState?.metrics.population);
console.log(result.forgedTools?.length ?? 0);
Or run the hosted demo at paracosm.agentos.sh/sim with zero setup. The demo caps turns, population, and model tier so public access stays affordable; paste your own OpenAI or Anthropic key into Settings to unlock full scope.
The universal result contract
Every runSimulation() call returns a Zod-validated RunArtifact exported from the paracosm/schema subpath. One shape covers three simulation modes, discriminated on metadata.mode:
turn-loop: civilization sims (paracosm's built-in mode). Populatestrajectory.timepoints[]anddecisions[]with per-turn specialist notes.batch-trajectory: digital-twin simulations. Labeled timepoints over a horizon, populated by external LangGraph-style executors.batch-point: one-shot forecasts. Overview and risk flags only, no trajectory.
import { RunArtifactSchema, type RunArtifact } from 'paracosm/schema';
import { runSimulation } from 'paracosm/runtime';
const artifact: RunArtifact = await runSimulation(leader, [], opts);
const parsed = RunArtifactSchema.parse(artifact); // optional runtime validation
switch (artifact.metadata.mode) {
case 'turn-loop':
case 'batch-trajectory':
case 'batch-point':
}
artifact.trajectory?.timepoints?.forEach((tp) => {
console.log(tp.label, tp.score?.value, tp.narrative);
});
The schema exposes 11 content primitives (RunMetadata, WorldSnapshot, Score, HighlightMetric, Timepoint, TrajectoryPoint, Trajectory, Citation, SpecialistDetail, SpecialistNote, RiskFlag, Decision) plus operational types (Cost, ProviderError). Every primitive carries an optional scenarioExtensions?: Record<string, unknown> escape hatch for domain-specific fields that must not pollute the universal shape.
Non-TypeScript consumers generate equivalent types from JSON Schema: npm run export:json-schema emits schema/run-artifact.schema.json and schema/stream-event.schema.json. Python projects use datamodel-codegen; any ecosystem with a JSON-Schema code generator adopts cleanly.
Subjects and interventions
For simulations built around a single subject (a person, character, organism, vessel) under a counterfactual intervention, paracosm/schema exposes SubjectConfig and InterventionConfig as first-class input primitives. Pass them through RunOptions and they carry through to RunArtifact.subject and RunArtifact.intervention for downstream consumers:
import { SubjectConfigSchema, InterventionConfigSchema } from 'paracosm/schema';
const subject = SubjectConfigSchema.parse({
id: 'user-42',
name: 'Alice',
profile: { age: 34, diet: 'mediterranean' },
signals: [{ label: 'HRV', value: 48.2, unit: 'ms', recordedAt: '2026-04-21T08:00:00Z' }],
markers: [{ id: 'rs4680', category: 'genome', value: 'AA' }],
});
const intervention = InterventionConfigSchema.parse({
id: 'intv-1',
name: 'Creatine + Sleep Hygiene',
description: '5g daily + 11pm bedtime.',
duration: { value: 12, unit: 'weeks' },
adherenceProfile: { expected: 0.7 },
});
const artifact = await runSimulation(leader, [], { scenario, subject, intervention });
Turn-loop mode stashes both verbatim without semantic consumption; external batch-trajectory executors populate them from their own flow.
What it does
Paracosm runs two leaders through the same scenario in parallel and makes their divergence measurable. Each turn has nine stages:
| Stage | Kind | Responsibility |
|---|---|---|
| Event Director | LLM | Observes state, generates events |
| Kernel advance | det. | Aging, births, deaths, resource deltas |
| Department analysis | LLM | Each dept may forge or reuse a tool |
| Commander decision | LLM | Reads all reports, picks an option |
| Outcome | det. | Seeded RNG + option risk probability |
| Effects | det. | Colony deltas via the EffectRegistry |
| Agent reactions | LLM | Every alive agent reacts in parallel |
| Memory | det. | Short-term consolidates, stances drift |
| Personality drift | det. | HEXACO traits shift under three forces |
Two runs on the same seed produce identical deterministic stages. The LLM stages diverge because every prompt carries the leader's HEXACO profile and the accumulated state it shaped. The asymmetry is the entire point.
How HEXACO drives decisions
Paracosm uses the HEXACO model (Ashton & Lee, 2007) across all six axes, with both poles producing concrete behavioral cues in the commander's decision-style block and the department analysis prompts:
- Openness. High: favor novel, untested approaches. Low: trust proven protocols.
- Conscientiousness. High: demand evidence and contingency plans. Low: move fast, accept ambiguity.
- Extraversion. High: lead from the front with public comms. Low: work through technical channels.
- Agreeableness. High: seek consensus with departments and Earth. Low: override consensus when you see a better path.
- Emotionality. High: weigh human cost heavily. Low: accept casualties for strategic gain.
- Honesty-Humility. High: report failures transparently. Low: leverage information asymmetries.
Trait thresholds are 0.7 (high) and 0.3 (low); cues only fire when a trait is meaningfully expressed. Visible in action at departments.ts:90 and commander-setup.ts:30.
Emergent tool forging + reuse
Department agents forge computational tools at runtime using AgentOS's EmergentCapabilityEngine. The forge_tool meta-tool builds, tests, and judge-reviews a new tool; the call_forged_tool meta-tool lets a later turn invoke that already-approved tool on new inputs without re-forging.
Personality drives the ratio. High-Openness leaders bias exploratory and forge more novel tools. High-Conscientiousness leaders bias conservative and reuse whenever an existing tool fits. On the same seed, the Visionary ends a six-turn run with a wider toolbox; the Engineer ends with a narrower toolbox but higher reuse count. The blog post walks through this as a case study: Inside Mars Genesis.
Cost follows. Reuse via call_forged_tool costs essentially nothing; every fresh forge costs a judge LLM call plus sandbox execution. The reuse economy is the single biggest lever on total run cost.
Scenario authoring
Any domain works. Mars colonies, submarine habitats, space stations, medieval kingdoms. The engine is domain-agnostic; the compiled scenario contract defines what gets simulated.
{
"id": "mars-genesis",
"labels": { "name": "Mars Genesis", "populationNoun": "colonists", "settlementNoun": "colony", "timeUnitNoun": "year", "timeUnitNounPlural": "years" },
"setup": { "defaultTurns": 6, "defaultSeed": 950, "defaultStartTime": 2035 },
"departments": [
{ "id": "medical", "label": "Medical", "role": "Chief Medical Officer", "instructions": "..." },
{ "id": "engineering", "label": "Engineering", "role": "Chief Engineer", "instructions": "..." }
],
"metrics": [
{ "id": "population", "format": "number" },
{ "id": "morale", "format": "percent" }
]
}
compileScenario() turns a scenario JSON draft plus optional seedText / seedUrl grounding into a runnable ScenarioPackage by generating TypeScript hook functions via LLM calls. Compilation costs about $0.10 per scenario and caches to disk. See compileScenario for the full hook contract.
Cost safety
The hosted demo uses three layered guards so public access stays affordable:
- Demo caps when
PARACOSM_HOSTED_DEMO=true: 6 turns (configurable), 30 colonists, 3 active departments, cheapest model tier. Settings UI locks the capped inputs and unlocks the moment a user pastes their own API key. - Per-IP rate limit: one simulation per IP per day for demo-mode requests, JSON-persisted across restarts.
- Abort gates: when all SSE clients disconnect for longer than 1.5 seconds, an AbortController fires and the runtime checks it before every LLM call in the turn. At most one in-flight call completes after a tab closes.
Users who want more runs paste their own OpenAI or Anthropic key. The dashboard's cost modal breaks down per-stage spend (director, commander, dept-by-name, judge, reactions) so the reuse economy's impact on total cost is visible.
API surface
import type { ScenarioPackage, Agent, LeaderConfig, HexacoProfile } from 'paracosm';
import { SimulationKernel, SeededRng } from 'paracosm';
import { marsScenario } from 'paracosm/mars';
import { lunarScenario } from 'paracosm/lunar';
import { runSimulation, runBatch } from 'paracosm/runtime';
import { compileScenario } from 'paracosm/compiler';
import {
RunArtifactSchema,
StreamEventSchema,
SubjectConfigSchema,
InterventionConfigSchema,
type RunArtifact,
type StreamEvent,
type SubjectConfig,
type InterventionConfig,
} from 'paracosm/schema';
Full type reference is auto-generated from source at /paracosm. The core types:
ScenarioPackage: domain-agnostic scenario bundleLeaderConfig: commander identity plus HEXACO profileHexacoProfile: six-axis personality vectorSimulationKernel: deterministic state machinerunSimulation: single-leader turn loop, returnsPromise<RunArtifact>runBatch: parallel multi-scenario runnercompileScenario: turns a scenario draft plus optional source grounding into a runnableScenarioPackage
HTTP + SSE server
The dashboard server exposes a small HTTP API for driving sims from any client:
| Method | Path | Purpose |
|---|---|---|
POST | /setup | Start a new simulation with leaders, turns, seed |
GET | /events | SSE stream of simulation events |
POST | /clear | Clear simulation state and chat agent pool |
POST | /chat | Chat with a colonist agent |
GET | /results | Full simulation results including verdict |
GET | /rate-limit | Check rate limit status |
POST | /compile | Compile a custom scenario draft with optional seedText / seedUrl grounding |
GET | /admin-config | Hosted-demo flags + effective caps |
/events replays a buffered event history on reconnect (persisted to disk so restarts do not evaporate completed runs), closes with a replay_done marker so clients can distinguish historical from live events.
The SSE stream emits a 17-variant StreamEvent discriminated union (defined in paracosm/schema), every event carrying a universal e.data.summary one-liner so consumers can render cleanly without narrowing on per-event fields:
turn_start, event_start, specialist_start, specialist_done, forge_attempt,
decision_pending, decision_made, outcome, personality_drift, agent_reactions,
bulletin, turn_done, promotion, systems_snapshot, provider_error,
validation_fallback, sim_aborted
Narrow via e.type for per-event intellisense on e.data. Validate the envelope at runtime with StreamEventSchema.parse(evt) when ingesting untrusted streams.
Related
- Emergent Capabilities: the forge + judge machinery underlying
forge_tool - HEXACO Personality: trait model, mutation system, persona overlays
- Cognitive Memory Guide: the memory pipeline colonists use as chat agents
- Inside Mars Genesis (blog): full case study
- Emergent Tools and HEXACO Leaders (blog): two-leader-one-seed comparison
- Build an AI Civilization in 5 Minutes (blog): tutorial