Multi-Agent Agency API
agency() is the high-level multi-agent factory in AgentOS. It coordinates a
named roster of sub-agents under a chosen orchestration strategy and returns a
single Agent-compatible interface so callers can swap a single agent for an
entire team without changing call sites.
Implemented features: strategy orchestration, session history, aggregate
usage/cost tracking, resource controls, HITL, guardrail evaluation, structured
Zod output, RAG context injection (v1 placeholder), listen() for voice
WebSocket transport, connect() for channel adapters, and real per-agent
streaming events on the sequential strategy.
Table of Contents
- API Hierarchy
- Minimal Example
- Orchestration Strategies
- Adaptive Mode
- Emergent Agent Creation
- Human-in-the-Loop (HITL)
- Memory and RAG
- Voice and Channels
- Guardrails and Security
- Permissions
- Resource Controls
- Observability and Callbacks
- Structured Output with Zod
- Nested Agencies
- Full-Featured Example
API Hierarchy
AgentOS exposes a layered public API. Each layer adds coordination features on top of the one below it.
generateText() — single stateless LLM call, no history
└── agent() — stateful multi-turn session, optional tools
└── agency() — multi-agent team with orchestration strategy
└── workflow() — imperative DAG of agency runs
└── AgentGraph — programmatic graph builder (advanced)
Use the lowest layer that satisfies your requirements:
| Entry point | Adds over previous | Best for |
|---|---|---|
generateText() | Nothing — raw call | One-shot prompts, evals |
streamText() | Streaming tokens | Chat UIs, long responses |
generateImage() | Image generation | Visuals, multi-modal pipelines |
agent() | Session history, tools | Single-agent assistants |
agency() | Multi-agent orchestration, HITL, guardrails, controls | Research pipelines, content teams, autonomous workflows |
workflow() | Imperative DAG sequencing of agencies | Multi-stage pipelines with branching logic |
AgentGraph | Programmatic graph construction + edge callbacks | Custom topologies, dynamic routing |
Minimal Example
Three lines to create and run a two-agent research pipeline:
import { agency } from '@framers/agentos';
const team = agency({
agents: {
researcher: { instructions: 'Find relevant facts.' },
writer: { instructions: 'Write a clear, concise summary.' },
},
strategy: 'sequential',
});
const result = await team.generate('Summarise recent advances in fusion energy.');
console.log(result.text);
Set OPENAI_API_KEY (or another provider's key) and the agency auto-detects
the provider. Pass model: 'openai:gpt-4o' or provider: 'anthropic' to
control the model explicitly.
Orchestration Strategies
sequential (default)
Agents run one after another. Each agent receives the previous agent's output as context, forming a progressive refinement chain.
const pipeline = agency({
model: 'openai:gpt-4o',
agents: {
researcher: { instructions: 'Gather facts on the topic.' },
editor: { instructions: 'Edit for clarity and concision.' },
reviewer: { instructions: 'Check tone and factual accuracy.' },
},
strategy: 'sequential',
});
const { text, agentCalls } = await pipeline.generate('Write about quantum computing.');
console.log(agentCalls.length); // 3 — one record per agent
parallel
All agents run concurrently. Their outputs are merged by a synthesis step that
uses the agency-level model. Requires model or provider at the agency
level.
const panel = agency({
model: 'openai:gpt-4o',
agents: {
optimist: { instructions: 'Argue in favour.' },
pessimist: { instructions: 'Argue against.' },
neutral: { instructions: 'Give a balanced view.' },
},
strategy: 'parallel',
});
const { text } = await panel.generate('Should AI systems have legal rights?');
debate
Agents argue and refine a shared answer over multiple rounds. The number of
rounds is controlled by maxRounds (default: 3). Requires an agency-level
model for the synthesis step.
const debaters = agency({
model: 'openai:gpt-4o',
agents: {
proponent: { instructions: 'Defend your position vigorously.' },
critic: { instructions: 'Challenge every claim you hear.' },
},
strategy: 'debate',
maxRounds: 4,
});
const { text } = await debaters.generate('Is remote work better than in-office?');
review-loop
One agent produces output; another reviews it and requests revisions. The loop
continues until the reviewer is satisfied or maxRounds is reached.
const loop = agency({
model: 'openai:gpt-4o-mini',
agents: {
drafter: { instructions: 'Draft a press release.' },
reviewer: { instructions: 'Review for brand voice and accuracy. Request changes if needed.' },
},
strategy: 'review-loop',
maxRounds: 3,
});
const { text } = await loop.generate('Announce our new product launch.');
hierarchical
A coordinator agent dispatches sub-tasks to specialist agents via tool calls. The coordinator decides which agents to invoke and in what order at runtime. Required for emergent agent synthesis.
const team = agency({
model: 'openai:gpt-4o',
agents: {
researcher: { instructions: 'Find factual information.' },
coder: { instructions: 'Write and explain code.' },
writer: { instructions: 'Produce polished prose.' },
},
strategy: 'hierarchical',
});
const { text } = await team.generate('Explain and demonstrate the quicksort algorithm.');
graph
Agents declare explicit dependencies via dependsOn. The orchestrator
topologically sorts agents into tiers and runs each tier concurrently. Every
agent receives the original user prompt plus the concatenated plain-text outputs
of its direct dependencies.
Auto-detection: when any agent in the roster has a dependsOn array, the
strategy is automatically set to 'graph' — you don't need to specify it
explicitly (though doing so is fine).
Cycle detection: the orchestrator validates the dependency DAG at construction time and throws if it contains a cycle.
Context passing: each agent's prompt is assembled as:
<original user prompt>
--- Output from <dependencyName> ---
<plain text output>
There is no expression language (no $steps.<name> references). Each agent
simply receives plain text from its predecessors.
Agent config — dependsOn
| Option | Type | Default | Description |
|---|---|---|---|
dependsOn | string[] | [] | Names of agents in the same agency that must complete before this agent runs. Agents with no dependsOn are roots and execute first. |
Full example — research team
const team = agency({
model: 'openai:gpt-4o',
agents: {
// Tier 0 — no dependencies, runs first
researcher: {
instructions: 'Research the topic thoroughly. Provide facts, statistics, and sources.',
},
// Tier 1 — both depend on researcher, run concurrently
writer: {
instructions: 'Write a polished article based on the research provided.',
dependsOn: ['researcher'],
},
illustrator: {
instructions: 'Describe 3 illustrations that would complement the article.',
dependsOn: ['researcher'],
},
// Tier 2 — depends on both writer and illustrator, runs last
reviewer: {
instructions: 'Review the article and illustrations for consistency and accuracy.',
dependsOn: ['writer', 'illustrator'],
},
},
strategy: 'graph', // optional — auto-detected from dependsOn
});
const { text, agentCalls } = await team.generate('Write about the James Webb Space Telescope.');
console.log(text);
console.log(agentCalls.map(c => `${c.agent} (${c.durationMs}ms)`));
// researcher (2100ms)
// writer (1800ms) — ran concurrently with illustrator
// illustrator (1200ms) — ran concurrently with writer
// reviewer (1500ms)
Streaming
const stream = team.stream('Write about the James Webb Space Telescope.');
for await (const chunk of stream.textStream) {
process.stdout.write(chunk);
}
Adaptive Mode
Set adaptive: true to let the orchestrator choose the best strategy at
runtime based on task complexity signals. The default strategy acts as a
hint; the coordinator may override it.
const smart = agency({
model: 'openai:gpt-4o',
agents: {
analyst: { instructions: 'Analyse data and trends.' },
reporter: { instructions: 'Write clear reports.' },
},
strategy: 'sequential', // default hint
adaptive: true, // may switch to hierarchical if the task is complex
});
const { text } = await smart.generate('Analyse this dataset and write a report.');
Adaptive mode is also the second way to unlock emergent agent synthesis (the
first is strategy: 'hierarchical').
Emergent Agent Creation
When enabled, the orchestrator may synthesise new specialist agents at runtime
to handle tasks not covered by the statically defined roster. Emergent agents
are subject to HITL approval when hitl.approvals.beforeEmergent is set.
Emergent requires either strategy: 'hierarchical' or adaptive: true.
const adaptive = agency({
model: 'openai:gpt-4o',
agents: {
generalist: { instructions: 'Handle most tasks.' },
},
strategy: 'hierarchical',
emergent: {
enabled: true,
tier: 'session', // 'session' | 'agent' | 'shared'
judge: true, // a separate judge agent evaluates emergent agents before use
},
});
tier | Lifetime of synthesised agents |
|---|---|
"session" | Discarded when the generate() call ends |
"agent" | Persist for the lifetime of the agency instance |
"shared" | Persist globally across all agency instances |
Human-in-the-Loop (HITL)
Gate any lifecycle point behind an async approval handler.
Built-in handlers
import { hitl } from '@framers/agentos';
hitl.autoApprove() // always approve — use in tests / CI
hitl.autoReject('dry-run mode') // always reject with an optional reason
hitl.cli() // interactive stdin/stdout prompt
hitl.webhook('https://my-service/ok') // POST to an HTTP endpoint
hitl.slack({ channel: '#approvals', token: process.env.SLACK_BOT_TOKEN })
Approval triggers
const guarded = agency({
model: 'openai:gpt-4o',
agents: { worker: { instructions: 'Execute tasks.' } },
hitl: {
approvals: {
beforeTool: ['delete-record', 'send-email'],
beforeAgent: ['financial-agent'],
beforeEmergent: true,
beforeReturn: true,
beforeStrategyOverride: true,
},
handler: hitl.autoApprove(), // replace with hitl.cli() in production
timeoutMs: 30_000,
onTimeout: 'reject', // 'reject' | 'approve' | 'error'
},
});
Custom handler
const custom = agency({
agents: { worker: { instructions: 'Do work.' } },
hitl: {
approvals: { beforeReturn: true },
handler: async (request) => {
// request.type, request.agent, request.action, request.description
const ok = await myApprovalDatabase.lookup(request.id);
return {
approved: ok,
reason: ok ? 'Approved by policy' : 'Blocked by policy',
modifications: ok ? undefined : { output: '[redacted]' },
};
},
},
});
Memory and RAG
Shared conversation memory
const remembering = agency({
model: 'openai:gpt-4o',
agents: {
a: { instructions: 'Agent A.' },
b: { instructions: 'Agent B.' },
},
strategy: 'sequential',
memory: {
shared: true, // all agents share one memory store
types: ['episodic', 'semantic'],
working: { enabled: true, maxTokens: 4096, strategy: 'sliding-window' },
consolidation: { enabled: true, interval: 'PT1H' },
},
});
RAG configuration
const withRag = agency({
model: 'openai:gpt-4o',
agents: {
retriever: { instructions: 'Find relevant context from the knowledge base.' },
answerer: { instructions: 'Answer based on retrieved context.' },
},
strategy: 'sequential',
rag: {
vectorStore: {
provider: 'in-memory',
embeddingModel: 'text-embedding-3-small',
},
documents: [
{ path: './docs/manual.pdf', loader: 'pdf' },
{ url: 'https://example.com/spec.html', loader: 'html' },
],
topK: 5,
minScore: 0.75,
graphRag: { enabled: true },
agentAccess: {
answerer: { topK: 10, collections: ['manuals'] },
},
},
});
Voice and Channels
Voice pipeline
When voice.enabled is true the agency exposes a listen() method that
starts a local WebSocket server. Callers receive the bound port and URL and can
connect any audio client. The full STT → LLM → TTS pipeline is provided by
src/voice-pipeline/; the agency wires generate() as the LLM backend.
const voiceAgent = agency({
model: 'openai:gpt-4o',
agents: { assistant: { instructions: 'You are a helpful voice assistant.' } },
voice: {
enabled: true,
transport: 'streaming',
stt: 'deepgram',
tts: 'elevenlabs',
ttsVoice: 'rachel',
endpointing: 'silero-vad',
bargeIn: 'threshold',
language: 'en-US',
},
});
// Bind to an OS-assigned port; connect audio clients to the returned URL.
const server = await voiceAgent.listen();
console.log(`Voice WS server ready at ${server.url}`);
// ...
await server.close();
Requires the ws package (npm install ws).
Channel adapters
When channels contains at least one entry the agency exposes a connect()
method. Calling it logs each configured channel and defers real adapter
initialisation to the runtime. Full adapter wiring (Discord, Telegram, Slack,
etc.) is handled by the channel adapter infrastructure in
src/channels/; connect() is the hook point for that wiring.
const social = agency({
model: 'openai:gpt-4o',
agents: { community: { instructions: 'Engage helpfully with community messages.' } },
channels: {
discord: { token: process.env.DISCORD_BOT_TOKEN, guildId: '...' },
telegram: { token: process.env.TELEGRAM_BOT_TOKEN },
slack: { token: process.env.SLACK_BOT_TOKEN, signingSecret: '...' },
},
});
await social.connect(); // logs each channel; real adapter connection is a follow-up
Guardrails and Security
Shorthand (applies to both input and output)
const safe = agency({
model: 'openai:gpt-4o',
agents: { assistant: { instructions: 'Be helpful.' } },
guardrails: ['pii-redaction', 'toxicity-filter', 'grounding-guard'],
});
Structured guardrails config
const audited = agency({
model: 'openai:gpt-4o',
agents: { assistant: { instructions: 'Be helpful.' } },
guardrails: {
input: ['injection-shield', 'pii-redaction'],
output: ['grounding-guard', 'code-safety'],
tier: 'strict',
},
security: { tier: 'balanced' }, // 'dangerous'|'permissive'|'balanced'|'strict'|'paranoid'
});
Security tiers
| Tier | Description |
|---|---|
"dangerous" | No restrictions — internal trusted pipelines only |
"permissive" | Most capabilities on; network + filesystem allowed |
"balanced" | Sensible defaults; destructive actions require approval |
"strict" | Read-only filesystem, no shell spawn, narrow tool allow-list |
"paranoid" | Minimal surface; all side-effecting tools blocked |
Permissions
const restricted = agency({
model: 'openai:gpt-4o',
agents: { analyst: { instructions: 'Analyse data.' } },
permissions: {
tools: ['read-file', 'query-db'], // explicit allow-list
network: false,
filesystem: true,
spawn: false,
requireApproval: ['delete-record'], // these still need HITL
},
});
Resource Controls
Hard and soft limits on token spend, duration, and call counts.
const budgeted = agency({
model: 'openai:gpt-4o',
agents: {
a: { instructions: 'Step 1.' },
b: { instructions: 'Step 2.' },
},
strategy: 'sequential',
controls: {
maxTotalTokens: 50_000, // across all agents in the run
maxCostUSD: 0.50,
maxDurationMs: 30_000,
maxAgentCalls: 20,
maxStepsPerAgent: 5,
maxEmergentAgents: 3,
onLimitReached: 'warn', // 'stop' | 'warn' | 'error'
},
on: {
limitReached: (e) => {
console.warn(`Limit breached: ${e.metric} = ${e.value} (limit ${e.limit})`);
},
},
});
Observability and Callbacks
Lifecycle callbacks
const observed = agency({
model: 'openai:gpt-4o',
agents: {
researcher: { instructions: 'Research.' },
writer: { instructions: 'Write.' },
},
strategy: 'sequential',
observability: {
logLevel: 'info',
traceEvents: true,
otel: { enabled: true },
},
on: {
agentStart: (e) => console.log(`[START] ${e.agent} — ${e.input.slice(0, 60)}`),
agentEnd: (e) => console.log(`[END] ${e.agent} — ${e.durationMs}ms`),
handoff: (e) => console.log(`[HANDOFF] ${e.fromAgent} -> ${e.toAgent}: ${e.reason}`),
toolCall: (e) => console.log(`[TOOL] ${e.agent} called ${e.toolName}`),
guardrailResult: (e) => console.log(`[GUARD] ${e.guardrailId}: ${e.passed ? 'pass' : 'block'}`),
emergentForge: (e) => console.log(`[FORGE] ${e.agentName} approved=${e.approved}`),
approvalRequested: (e) => console.log(`[HITL] ${e.type}: ${e.description}`),
limitReached: (e) => console.warn(`[LIMIT] ${e.metric}: ${e.value}/${e.limit}`),
error: (e) => console.error(`[ERROR] ${e.agent}: ${e.error.message}`),
},
});
Provenance / audit trail
const auditable = agency({
model: 'openai:gpt-4o',
agents: { worker: { instructions: 'Do auditable work.' } },
provenance: {
enabled: true,
hashChain: true,
record: { toolCalls: true, agentOutputs: true },
export: 'jsonl', // 'jsonl' | 'otlp' | 'solana'
},
});
Structured Output with Zod
Pass a Zod schema to output and the final agent's response is validated and
parsed against it. The result's object field carries the typed value.
import { z } from 'zod';
import { agency } from '@framers/agentos';
const schema = z.object({
title: z.string(),
summary: z.string(),
keyPoints: z.array(z.string()).min(3),
sentiment: z.enum(['positive', 'neutral', 'negative']),
});
const extractor = agency({
model: 'openai:gpt-4o',
agents: {
extractor: { instructions: 'Extract structured information from the text.' },
},
output: schema,
});
const result = await extractor.generate('...article text...');
const data = result.object as z.infer<typeof schema>;
console.log(data.title, data.keyPoints);
Nested Agencies
An agency() instance satisfies the Agent interface and can be placed
directly in another agency's agents roster. The outer strategy treats it
as a single opaque agent call.
import { agency } from '@framers/agentos';
// Inner agency — dedicated research pipeline
const researchTeam = agency({
model: 'openai:gpt-4o-mini',
agents: {
searcher: { instructions: 'Search for sources.' },
analyst: { instructions: 'Analyse and rank sources.' },
},
strategy: 'sequential',
});
// Outer agency — uses researchTeam as one of its agents
const publishingTeam = agency({
model: 'openai:gpt-4o',
agents: {
researchTeam, // nested agency
writer: { instructions: 'Write from research.' },
editor: { instructions: 'Polish and fact-check.' },
},
strategy: 'sequential',
});
const { text, agentCalls } = await publishingTeam.generate('Write about quantum computing.');
// agentCalls[0] represents the entire researchTeam run as a single call
Nesting can go arbitrarily deep. usage and agentCalls are aggregated
through all layers. close() propagates inward — the outer agency calls
close() on every nested agency in its roster.
Full-Featured Example
The following example combines all major features in one agency configuration.
import { z } from 'zod';
import { agency, hitl } from '@framers/agentos';
// Reusable inner team handling research
const researchTeam = agency({
model: 'openai:gpt-4o-mini',
agents: {
searcher: { instructions: 'Search for authoritative sources on the topic.' },
fact: { instructions: 'Verify claims and flag unsupported assertions.' },
},
strategy: 'sequential',
controls: { maxTotalTokens: 20_000 },
});
// Outer agency orchestrating the full content pipeline
const contentPipeline = agency({
name: 'content-pipeline',
model: 'openai:gpt-4o',
agents: {
research: researchTeam, // nested agency
writer: {
instructions: 'Write a compelling, well-structured article.',
maxSteps: 5,
},
editor: {
instructions: 'Edit for clarity, grammar, and brand voice.',
},
},
strategy: 'sequential',
adaptive: true,
emergent: {
enabled: true,
tier: 'session',
judge: true,
},
memory: {
shared: true,
types: ['episodic', 'semantic'],
working: { enabled: true, maxTokens: 8192 },
},
rag: {
vectorStore: { provider: 'in-memory', embeddingModel: 'text-embedding-3-small' },
topK: 5,
minScore: 0.7,
},
guardrails: {
input: ['injection-shield'],
output: ['grounding-guard', 'pii-redaction'],
tier: 'balanced',
},
security: { tier: 'balanced' },
permissions: {
tools: 'all',
network: true,
filesystem: false,
spawn: false,
},
hitl: {
approvals: {
beforeReturn: true,
beforeEmergent: true,
},
handler: hitl.autoApprove(), // swap for hitl.cli() or hitl.webhook() in production
timeoutMs: 60_000,
onTimeout: 'reject',
},
controls: {
maxTotalTokens: 100_000,
maxCostUSD: 2.00,
maxDurationMs: 120_000,
maxAgentCalls: 50,
onLimitReached: 'warn',
},
observability: {
logLevel: 'info',
traceEvents: true,
},
on: {
agentStart: (e) => console.log(`[>] ${e.agent}`),
agentEnd: (e) => console.log(`[<] ${e.agent} (${e.durationMs}ms)`),
limitReached: (e) => console.warn(`limit: ${e.metric} = ${e.value}`),
error: (e) => console.error(`error in ${e.agent}: ${e.error.message}`),
},
output: z.object({
title: z.string(),
body: z.string(),
wordCount: z.number(),
readingTime: z.string(),
}),
});
// Non-streaming call
const result = await contentPipeline.generate('Write an article about large language models.');
console.log(result.text);
console.log(result.agentCalls.length, 'agent calls');
console.log(result.usage.totalTokens, 'total tokens');
// Streaming call
const stream = contentPipeline.stream('Write about transformers.');
for await (const chunk of stream.textStream) {
process.stdout.write(chunk);
}
// Multi-turn session
const session = contentPipeline.session('article-001');
await session.send('Outline the article.');
await session.send('Now write the introduction.');
const history = session.messages(); // [{role:'user',content:'...'}, ...]
console.log(history.length, 'messages in session');
// Cleanup
await contentPipeline.close();
See Also
docs/HIGH_LEVEL_API.md—generateText(),streamText(),generateImage(), singleagent()docs/HUMAN_IN_THE_LOOP.md— HITL handler deep-divedocs/GUARDRAILS_USAGE.md— guardrail policies and custom guardrail authoringdocs/RAG_MEMORY_CONFIGURATION.md— RAG and memory configuration referencedocs/OBSERVABILITY.md— OTEL integration and trace event referencedocs/STRUCTURED_OUTPUT.md— Zod schema output and extraction patternsdocs/AGENT_GRAPH.md—AgentGraphprogrammatic graph builder (advanced)src/api/types.ts— canonical TypeScript type definitionssrc/api/agency.ts—agency()implementationsrc/api/hitl.ts— HITL handler factories