Skip to main content

Observability (OpenTelemetry)

AgentOS provides opt-in OpenTelemetry (OTEL) spans, metrics, and log correlation/export hooks. AgentOS itself does not start an OTEL SDK. Your host application owns exporters, sampling, and context propagation.


Table of Contents

  1. Overview
  2. Opt-In Policy
  3. Enable via AgentOS Config (Recommended)
  4. Enable via Environment Variables
  5. Host Setup (Node.js Example)
  6. What AgentOS Emits
  7. Logging (Pino + OTEL Logs)
  8. Privacy & Cardinality
  9. Performance Notes
  10. SOTA Techniques (TypeScript Agentic AI)

Overview

Defaults (all OFF):

  • Manual AgentOS spans
  • AgentOS metrics
  • Trace IDs in streamed responses
  • Log correlation (trace_id, span_id)
  • OTEL LogRecord export

When enabled, AgentOS emits:

  • Spans around turns and tool-result handling
  • Metrics for turn/tool counters + histograms
  • Optional: trace correlation in logs and streamed response metadata
  • Optional: OTEL LogRecords (exported by your host, via OTLP)

Opt-In Policy

There are two layers:

  1. Host OTEL SDK (required for export)

    • In Node: @opentelemetry/sdk-node + exporters/instrumentations.
    • In browsers: the web OTEL SDK (if you choose to export from the client).
  2. AgentOS instrumentation toggles (controls what AgentOS emits)

    • Env flags (global defaults)
    • AgentOSConfig.observability (per-agent control)

Precedence:

  • AgentOSConfig.observability.enabled = false hard-disables all AgentOS observability helpers (even if env is set).
  • Otherwise, config fields override env fields, and env provides defaults.

import { AgentOS } from '@framers/agentos';

const agentos = new AgentOS();
await agentos.initialize({
// ...your normal config...
observability: {
// Master switch: when true, defaults to enabling tracing/metrics + log correlation.
// Keep explicit per-signal toggles if you want a tighter blast radius.
// enabled: true,

tracing: {
enabled: true,
includeTraceInResponses: true, // adds metadata.trace to select streamed chunks
},
metrics: {
enabled: true,
},
logging: {
includeTraceIds: true, // adds trace_id/span_id to pino log meta
exportToOtel: false, // keep OFF unless you want OTLP log export
// otelLoggerName: '@framers/agentos',
},
},
});

Enable via Environment Variables

# Master switch: enables tracing + metrics + log trace_id/span_id injection defaults
AGENTOS_OBSERVABILITY_ENABLED=true

# Optional fine-grained toggles
AGENTOS_TRACING_ENABLED=true
AGENTOS_METRICS_ENABLED=true
AGENTOS_TRACE_IDS_IN_RESPONSES=true
AGENTOS_LOG_TRACE_IDS=true

# Optional: emit OTEL LogRecords (still requires a host SDK + logs exporter)
AGENTOS_OTEL_LOGS_ENABLED=true

# Names (advanced; usually keep defaults)
AGENTOS_OTEL_TRACER_NAME=@framers/agentos
AGENTOS_OTEL_METER_NAME=@framers/agentos
AGENTOS_OTEL_LOGGER_NAME=@framers/agentos

Host Setup (Node.js Example)

AgentOS only uses OTEL APIs; your host must install/start an OTEL SDK to export anything.

Typical Node setup:

  1. Install dependencies:
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
  1. Configure env (OTLP/HTTP collector example):
OTEL_SERVICE_NAME=my-agent-host
OTEL_TRACES_EXPORTER=otlp
OTEL_METRICS_EXPORTER=otlp
# OTEL_LOGS_EXPORTER=otlp # keep explicit opt-in for log export
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

# Optional sampling
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
  1. Start the SDK early (before most imports) so auto-instrumentation can patch libraries:
// otel.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
instrumentations: [getNodeAutoInstrumentations()],
});

export async function startOtel(): Promise<void> {
await sdk.start();
}

export async function shutdownOtel(): Promise<void> {
await sdk.shutdown();
}

In this monorepo, the backend bootstrap lives at backend/src/observability/otel.ts.


What AgentOS Emits

Spans

When tracing is enabled, AgentOS emits spans such as:

  • agentos.turn
  • agentos.gmi.get_or_create
  • agentos.gmi.process_turn_stream
  • agentos.tool_result
  • agentos.gmi.handle_tool_result
  • agentos.conversation.save (stage-tagged)

Metrics

When metrics are enabled, AgentOS records:

  • agentos.turns (counter)
  • agentos.turn.duration_ms (histogram)
  • agentos.turn.tokens.total|prompt|completion (histograms; only when usage is available)
  • agentos.turn.cost.usd (histogram; only when cost is available)
  • agentos.tool_results (counter)
  • agentos.tool_result.duration_ms (histogram)

Trace IDs in Streamed Responses

When enabled, AgentOS attaches trace metadata to select streamed chunks:

{
"metadata": {
"trace": {
"traceId": "...",
"spanId": "...",
"traceparent": "00-...-...-01"
}
}
}

Logging (Pino + OTEL Logs)

Stdout Logs (Default)

AgentOS uses pino for structured logs. When includeTraceIds is enabled and an active span exists, AgentOS adds:

  • trace_id
  • span_id

to log metadata to make correlation easy in any log backend.

OTEL LogRecord Export (Optional)

If you want logs to flow through the OTLP pipeline (instead of, or in addition to, stdout shipping):

  1. Enable AgentOS OTEL log emission:

    • AGENTOS_OTEL_LOGS_ENABLED=true, or
    • observability.logging.exportToOtel = true
  2. Enable a host logs exporter (Node example):

OTEL_LOGS_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

Recommendation:

  • Keep OTEL log export OFF by default.
  • Use stdout logs + trace correlation for most deployments.
  • Turn on OTEL log export when you explicitly want one unified OTLP pipeline for traces/metrics/logs.

Privacy & Cardinality

Defaults are conservative:

  • Prompts, model outputs, and tool arguments are not recorded by default.
  • Prefer safe metadata only (durations, status, tool names, model/provider ids, token usage, cost).

Cardinality guidance:

  • Avoid labeling metrics/spans with high-cardinality values (user ids, conversation ids, raw URLs, prompt text).
  • Keep attributes stable and low-cardinality (e.g. status, tool_name, persona_id).

Performance Notes

  • AgentOS observability helpers are a safe no-op when disabled.
  • Traces/metrics are typically low overhead when sampling is enabled.
  • OTEL log export can add noticeable CPU/network overhead at debug volume; use it intentionally.

SOTA Techniques (TypeScript Agentic AI)

Patterns that work well in practice:

  • Structured event stream: emit agent lifecycle events (turn started, tool called, tool returned, policy decision, final output) as strongly-typed records (AgentOS already streams chunks; persist them if you need audits).
  • W3C context propagation: propagate traceparent across inbound HTTP, SSE/WebSocket streaming, and tool calls; use OTEL context managers (AsyncLocalStorage) in Node.
  • GenAI semantic conventions: add gen_ai.* attributes/events to spans when instrumenting model calls, tool calls, and token usage; keep raw content behind explicit opt-in and redaction.
  • Redaction and data classification: treat prompt/tool args/output as sensitive by default; add allowlists + hashing for debugging without content exfiltration.

Common library choices:

  • Telemetry: @opentelemetry/sdk-node, @opentelemetry/auto-instrumentations-node
  • Logging: pino (+ @opentelemetry/instrumentation-pino if you want automatic injection everywhere)
  • Agent/LLM observability layers: Langfuse, Helicone, Sentry AI monitoring, OpenLIT, OpenLLMetry-js