Skip to main content

Capability Discovery

See also

For discovery architecture and integration points, see Capability Discovery — Token-Efficient Tool Context.

Semantic, tiered capability discovery that replaces static tool dumps with per-turn, token-budgeted context — cutting capability context by ~90% (from ~20,000 to ~1,850 tokens) while improving retrieval accuracy.


Overview

Traditional agent frameworks dump every registered tool schema into the system prompt. At scale (20+ tools, 40 skills, 20 channels), this creates ~20,000 tokens of static context that the model must parse every turn — most of it irrelevant. Research calls this context rot (Chroma 2025): degrading output quality as irrelevant context accumulates.

The Capability Discovery Engine solves this with a three-tier context model:

TierBudgetContentWhen
Tier 0~150 tokensCategory summariesAlways in context
Tier 1~200 tokensTop-5 capability summariesPer-turn semantic retrieval
Tier 2~1,500 tokensFull schema/content for top-2Per-turn deep pull
Total~1,850 tokens

Agents also get a meta-tool (discover_capabilities, ~80 tokens in tool list) for active search when passive tiers miss something.


Architecture

┌──────────────────────────────────────────────┐
│ CapabilityDiscoveryEngine │
│ (orchestrator) │
└────────────┬──────────┬──────────┬───────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌──────────┐ ┌──────────────────┐
│ Capability │ │Capability│ │CapabilityContext- │
│ Index │ │ Graph │ │ Assembler │
└─────┬──────┘ └──────────┘ └──────────────────┘


┌────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Embedding │ │ ManifestScanner │ │DiscoverCapabili- │
│ Strategy │ │ (CAPABILITY.yaml)│ │ tiesTool (meta) │
└────────────┘ └──────────────────┘ └──────────────────┘

Per-turn data flow:

User Message
→ CapabilityIndex.search() // semantic vector search
→ CapabilityGraph.rerank() // boost related capabilities via graph edges
→ CapabilityContextAssembler.assemble() // token-budgeted tier assembly
→ CapabilityDiscoveryResult // injected into prompt
ComponentResponsibility
CapabilityDiscoveryEngineTop-level orchestrator — coordinates index, graph, and assembler
CapabilityIndexNormalizes sources into CapabilityDescriptor; embeds and stores in vector index
CapabilityGraphGraphology relationship graph (4 edge types); provides re-ranking boosts
CapabilityContextAssemblerBuilds Tier 0/1/2 context within hard token budgets
CapabilityEmbeddingStrategyConstructs intent-oriented embedding text per capability
CapabilityManifestScannerScans for CAPABILITY.yaml manifests; hot-reload via fs.watch
createDiscoverCapabilitiesToolFactory for the discover_capabilities meta-tool

Three-Tier Context Model

Tier 0 — Always in Context (~150 tokens)

Category summaries giving the model a high-level map. Regenerated only when capabilities change (version-tracked cache).

Available capability categories:
- Communication: telegram, discord, slack, whatsapp (+16 more) (20)
- Information: web-search, news-search, web-browser (3)
- Developer-tools: github, git, cli-executor (3)
Use discover_capabilities tool to get details on any capability.

Tier 1 — Semantic Retrieval (~200 tokens)

Per-turn top-5 retrieval as compact summaries (~30-50 tokens each). Minimum relevance threshold: 0.3.

Relevant capabilities:
1. web-search (tool). Search the web for current information. Params: query, max_results
2. news-search (tool). Search news articles by keyword. Params: query, date_range
3. web-browser (tool). Browse a URL and extract content. Params: url, selector
4. github (skill). Use the GitHub CLI for issues, PRs, repos. Requires: gh
5. summarizer (skill). Summarize long documents into key points

Tier 2 — Full Details (~1,500 tokens)

Full schema or SKILL.md content for the top-2 from Tier 1:

# Web Search
Kind: tool | Category: information

Search the web for current information using the Serper API.

## Input Schema
query (string, required): The search query
max_results (number): Maximum results to return (default: 5)
search_type (string): Type of search [search|news|images] (default: "search")

Required secrets: SERPER_API_KEY

Token budgets are hard-enforced by the assembler using a ~4 chars/token heuristic.


Source Normalization

CapabilityIndex normalizes five source types into unified CapabilityDescriptor objects:

SourceID ConventionKindExample
Tools (ITool)tool:{name}tooltool:web-search
Skills (SKILL.md)skill:{name}skillskill:github
Extensions (catalog)extension:{name}extensionextension:giphy
Channels (platform)channel:{platform}channelchannel:telegram
Manifests (CAPABILITY.yaml){kind}:{name} or customanytool:my-custom-tool

Each descriptor carries: id, kind, name, displayName, description, category, tags, requiredSecrets, requiredTools, available, hasSideEffects, and lazy-load fields fullSchema (Tier 2 tool schemas) and fullContent (Tier 2 skill content). A sourceRef discriminated union points back to the original source for on-demand loading.

Normalization is deterministic and runs during initialize(). Skills derive displayName by capitalizing name segments. Channels always get category: 'communication'. Extensions inherit availability from the catalog entry.


Embedding Strategy

CapabilityEmbeddingStrategy constructs a concise text per capability (100-300 tokens) optimized for semantic matching against user intents. Informed by ToolLLM (NDCG@5 of 84.9 on 16K+ APIs) and MCP-RAG parameter-level decomposition.

FieldWhyExample
Name + display nameExact-match queries"Web Search (web-search)"
DescriptionCore semantic content"Search the web for current information"
CategoryCategorical queries"Category: information"
TagsUse-case queries"Use cases: search, api, news"
Parameter names (tools)Action queries"Parameters: query, max_results"
DependenciesComposition queries"Requires: gh, git"

Fields not embedded: fullSchema, fullContent, requiredSecrets — these are Tier 2 data loaded on demand to keep embedding text lean.


Graph Relationships

CapabilityGraph uses graphology (shared with GraphRAGEngine) for O(1) neighbor lookups and sub-millisecond traversal. All edges are built deterministically from metadata.

Edge Types

Edge TypeSourceWeightExample
DEPENDS_ONrequiredTools frontmatter1.0skill:github -> tool:cli-executor
COMPOSED_WITHPreset co-occurrence0.5tool:web-search <-> skill:summarizer
TAGGED_WITHShared tags (>= 2 overlap)0.3/tagtool:news-search <-> tool:web-search
SAME_CATEGORYSame kind:category (2-8 group)0.1tool:web-search <-> tool:web-browser

Re-Ranking Algorithm

CapabilityGraph.rerank() runs after semantic search:

  1. For each result, look up 1-hop graph neighbors.
  2. If a neighbor is also in results, mutual boost: score += graphBoostFactor * edgeWeight.
  3. If a neighbor is not in results but has DEPENDS_ON or COMPOSED_WITH, pull it in: score = parentScore * graphBoostFactor * edgeWeight.
  4. Re-sort by score descending.

If a user asks about GitHub issues and skill:github ranks high, tool:cli-executor (its dependency) gets pulled in even without a direct query match.


Meta-Tool: discover_capabilities

When passive tiers miss something, agents actively search via the discover_capabilities tool (~80 tokens in tool list):

discover_capabilities({ query: "send a message on Discord", kind: "channel" })
// → { capabilities: [{ id: "channel:discord", relevance: 0.91, ... }], totalIndexed: 66 }

The tool runs through the same discover() pipeline and returns Tier 1 results. It is always included in listDiscoveredTools() output regardless of query relevance.

import { createDiscoverCapabilitiesTool } from '@framers/agentos/discovery';
const metaTool = createDiscoverCapabilitiesTool(discoveryEngine);
toolOrchestrator.registerTool(metaTool);

File-Based Discovery

Custom capabilities defined via CAPABILITY.yaml, scanned by CapabilityManifestScanner.

Scan directories (priority order):

  1. ~/.agentos/capabilities/ (user-global)
  2. ./.agentos/capabilities/ (workspace-local)
  3. $AGENTOS_CAPABILITY_DIRS (env var, colon-separated)

Directory structure:

~/.agentos/capabilities/my-custom-tool/
CAPABILITY.yaml # required
SKILL.md # optional (loaded as fullContent)
schema.json # optional (loaded as fullSchema)

CAPABILITY.yaml format:

id: tool:my-custom-tool
kind: tool
name: my-custom-tool
displayName: My Custom Tool
description: Searches a proprietary API for internal documents
category: information
tags: [search, internal, documents]
requiredSecrets: [INTERNAL_API_KEY]
hasSideEffects: false
inputSchema: { type: object, properties: { query: { type: string } } }
skillContent: ./SKILL.md

Required fields: name, kind, description. The id defaults to ${kind}:${name}.

Hot-reload via fs.watch with debouncing (default 500ms):

const scanner = new CapabilityManifestScanner();
scanner.watch(scanner.getDefaultDirs(), async (descriptors) => {
await discoveryEngine.refreshIndex({ manifests: descriptors });
});

Integration Points

PromptBuilder — render discovery result into system prompt text:

const result = await discoveryEngine.discover(userMessage);
const contextText = discoveryEngine.renderForPrompt(result);

ToolOrchestrator — filter tool schemas to only discovered tools via listDiscoveredTools():

const toolSchemas = await toolOrchestrator.listDiscoveredTools(discoveryResult);
// Returns only Tier 1/2 tools + discover_capabilities meta-tool

Chat runtime — wire before prompt composition in the GMI loop:

const discoveryResult = await discoveryEngine.discover(userMessage);
const tools = await toolOrchestrator.listDiscoveredTools(discoveryResult);
const capabilityContext = discoveryEngine.renderForPrompt(discoveryResult);
const prompt = promptBuilder.build({ capabilityContext, ...otherInputs });
const response = await provider.complete({ prompt, tools });

AgentOS Turn Planner (Core Integration)

AgentOS now supports a first-class turn planner that runs before each GMI turn:

  • Sets tool failure policy (fail_open or fail_closed)
  • Applies dynamic tool selection (discovered or all)
  • Injects discovery context into prompt metadata

Default behavior is success-rate optimized:

  • defaultToolFailureMode: "fail_open"
  • discovery enabled
  • fallback to full toolset when discovery fails or yields no tool matches
await agentos.initialize({
// ...
turnPlanning: {
enabled: true,
defaultToolFailureMode: 'fail_open',
allowRequestOverrides: true,
discovery: {
enabled: true,
autoInitializeEngine: true,
registerMetaTool: true,
onlyAvailable: true,
defaultToolSelectionMode: 'discovered',
includePromptContext: true,
maxRetries: 1,
retryBackoffMs: 150,
},
},
});

Per-request overrides can be provided via options.customFlags:

  • toolFailureMode: fail_open | fail_closed
  • toolSelectionMode: all | discovered
  • capabilityDiscoveryKind: tool | skill | extension | channel | any

Runtime metadata updates now also include executionLifecycle transitions:

  • planned -> executing
  • degraded (if discovery fallback is applied in fail-open mode)
  • recovered (optional, when turn completes successfully after degradation)
  • completed or errored

AgentOS also emits taskOutcome in metadata updates at the end of each turn:

  • status: success | partial | failed
  • score: normalized score in [0, 1]
  • source: heuristic or request_override

When task outcome telemetry is enabled (default), AgentOS also emits taskOutcomeKpi as a rolling stream payload (windowed success stats):

  • scopeKey: aggregation key (global / org / org+persona)
  • sampleCount, successCount, partialCount, failedCount
  • successRate
  • averageScore / weightedSuccessRate

taskOutcome can be overridden per request via options.customFlags:

  • taskOutcome: success | partial | failed | numeric 0..1
  • taskSuccess: boolean

Task outcome telemetry can be configured under orchestratorConfig.taskOutcomeTelemetry:

  • enabled (default true)
  • rollingWindowSize (default 100)
  • scope: global | organization | organization_persona (default)
  • emitAlerts (default true)
  • alertBelowWeightedSuccessRate (default 0.55)
  • alertMinSamples (default 8)
  • alertCooldownMs (default 60000)

When alerting is enabled and KPI degrades, metadata updates include taskOutcomeAlert with severity/reason/threshold/value so clients can trigger automated remediation.

To persist KPI windows across restarts, provide taskOutcomeTelemetryStore in orchestrator dependencies. The store contract is:

  • loadWindows(): Promise<Record<string, TaskOutcomeKpiWindowEntry[]>>
  • saveWindow(scopeKey, entries): Promise<void>

AgentOS includes a built-in SQL implementation:

import { SqlTaskOutcomeTelemetryStore } from '@framers/agentos';

const taskOutcomeTelemetryStore = new SqlTaskOutcomeTelemetryStore({
// Uses @framers/sql-storage-adapter resolution rules.
priority: ['better-sqlite3', 'sqljs'],
database: './data/agentos_task_outcomes.db',
});

Adaptive recovery can be configured under orchestratorConfig.adaptiveExecution:

  • enabled (default true)
  • minSamples (default 5)
  • minWeightedSuccessRate (default 0.7)
  • forceAllToolsWhenDegraded (default true)
  • forceFailOpenWhenDegraded (default true)

When enabled, if rolling task KPI degrades below threshold, AgentOS can automatically switch turn policy from toolSelectionMode=discovered to toolSelectionMode=all to recover task success rate. It can also force toolFailureMode=fail_open unless the request explicitly pinned toolFailureMode=fail_closed via options.customFlags.


Configuration

interface CapabilityDiscoveryConfig {
tier0TokenBudget: number; // Default: 200
tier1TokenBudget: number; // Default: 800
tier2TokenBudget: number; // Default: 2000
tier1TopK: number; // Default: 5
tier2TopK: number; // Default: 2
tier1MinRelevance: number; // Default: 0.3 (0-1 scale)
useGraphReranking: boolean; // Default: true
collectionName: string; // Default: 'capability_index'
embeddingModelId?: string; // Default: undefined (use system default)
graphBoostFactor: number; // Default: 0.15 (0-1 scale)
}

Override at construction time or per-query:

const engine = new CapabilityDiscoveryEngine(embeddingManager, vectorStore, {
tier1TopK: 8, tier2TopK: 3, graphBoostFactor: 0.2,
});

const result = await engine.discover("search the web", {
config: { tier2TokenBudget: 3000 },
kind: 'tool',
onlyAvailable: true,
});

Performance

MetricValueNotes
Index build (initialize)~3sOne-time; embedding API calls for ~100 capabilities
Per-turn discover() cold~50msEmbedding generation for the query
Per-turn discover() warm~5msEmbedding cache hit (LRU)
Graph re-ranking<1msSub-millisecond for ~100 nodes
Memory overhead~2MBDescriptor map + graphology graph + embedding cache
Context tokens (static)~20,000All tools + skills + channels dumped
Context tokens (discovery)~1,850Tier 0 + Tier 1 + Tier 2 combined
Token reduction~90%20,000 -> 1,850
Meta-tool cost~80 tokensdiscover_capabilities in tool list

Usage Example

import {
CapabilityDiscoveryEngine,
CapabilityManifestScanner,
createDiscoverCapabilitiesTool,
} from '@framers/agentos/discovery';

// Stand-ins. Replace each with the runtime-supplied instance you already
// have (embeddingManager / vectorStore from your memory wiring; the four
// catalog managers from your AgentOS instance).
declare const embeddingManager: any;
declare const vectorStore: any;
declare const toolOrchestrator: any;
declare const skillRegistry: any;
declare const extensionCatalog: any;
declare const channelRouter: any;

// --- Initialization (once at startup) ---

const engine = new CapabilityDiscoveryEngine(embeddingManager, vectorStore);
const scanner = new CapabilityManifestScanner();
const manifests = await scanner.scan();

await engine.initialize({
tools: toolOrchestrator.getToolDescriptors(),
skills: skillRegistry.getSkillEntries(),
extensions: extensionCatalog.listAll(),
channels: channelRouter.listPlatforms(),
manifests,
});

toolOrchestrator.registerTool(createDiscoverCapabilitiesTool(engine));

scanner.watch(scanner.getDefaultDirs(), async (descs) => {
await engine.refreshIndex({ manifests: descs });
});

console.log(engine.getStats());
// { capabilityCount: 66, graphNodes: 66, graphEdges: 142, indexVersion: 1 }

// --- Per-turn discovery (in GMI loop) ---

const discoveryResult = await engine.discover("Search the web for AI news and summarize it");
// discoveryResult.tokenEstimate.totalTokens ≈ 1,850

const tools = await toolOrchestrator.listDiscoveredTools(discoveryResult);
// tools.length ≈ 3 (web-search, news-search, discover_capabilities)

const capabilityContext = engine.renderForPrompt(discoveryResult);
// Inject into system prompt via PromptBuilder

Source Files

All source lives in packages/agentos/src/discovery/:

FileExport
types.tsAll types, DEFAULT_DISCOVERY_CONFIG
CapabilityDiscoveryEngine.tsCapabilityDiscoveryEngine
CapabilityIndex.tsCapabilityIndex
CapabilityGraph.tsCapabilityGraph
CapabilityContextAssembler.tsCapabilityContextAssembler
CapabilityEmbeddingStrategy.tsCapabilityEmbeddingStrategy
CapabilityManifestScanner.tsCapabilityManifestScanner
DiscoverCapabilitiesTool.tscreateDiscoverCapabilitiesTool()
index.tsBarrel re-exports for @framers/agentos/discovery