Skip to main content

LLM Providers

The cost of a model going down today does not save you when it goes down tomorrow. The right question for any production agent runtime is not which provider — it is what happens when this provider isn't available. Provider outages happen. Rate limits hit at the worst moment. A model deprecates. A region throttles. A subscription lapses. None of these stop being true because the agent is mid-conversation.

AgentOS abstracts every LLM behind a single IProvider interface. Eleven providers are wired in directly — nine via API key, two via local CLI bridges that ride your existing Claude Max or Google account subscription. OpenRouter, included in the eleven, fans out to 200+ additional models from the same set of vendors. Every provider speaks the same streaming protocol, supports the same tool-call shape (with the documented exceptions below), and participates in the same cost ledger. Failover is automatic by default. The fallback chain is auto-built from whichever keys you've set, and is overridable per agent.

The point isn't that AgentOS hides the provider. It's that it stops being a load-bearing decision. Pick a primary, set one fallback key, the runtime handles the rest.


Table of Contents

  1. Overview
  2. Provider Matrix
  3. Quick Start
  4. Auto-Detection Order
  5. Provider Configuration
  6. Fallback Behavior
  7. Cost Tiers
  8. Provider Details
  9. Programmatic Configuration
  10. Adding a Custom Provider
  11. Provider Capabilities Detail
  12. Related Documentation

Overview

AgentOS abstracts LLM access behind a unified IProvider interface. You configure providers via environment variables, and AgentOS handles model selection, streaming, tool calling, retries, and fallback routing.

Key features:

  • 11 providers supported out of the box (9 API-key + 2 CLI-based)
  • CLI providers: Use your Claude Max or Google account subscription via local CLI — no API key needed
  • Auto-detection: Set an API key or install a CLI and the provider is available
  • Fallback: Automatic retry with alternate providers on failure (fallbackProviders)
  • Cost-aware caps: Per-run cost ceilings via controls.maxCostUSD; route requests to cheaper models with a custom router
  • Streaming: All providers support streaming with a unified async iterator
  • Tool calling: Unified function/tool calling across providers that support it

Provider Matrix

ProviderEnv VarDefault ModelStreamingTool CallingVisionEmbeddingCost Tier
OpenAIOPENAI_API_KEYgpt-4oYesYesYesYes$$$
AnthropicANTHROPIC_API_KEYclaude-sonnet-4-5-20250929YesYesYesNo$$$
GeminiGEMINI_API_KEYgemini-2.5-flashYesYesYesYes$$
GroqGROQ_API_KEYllama-3.3-70b-versatileYesYesNoNo$
TogetherTOGETHER_API_KEYmeta-llama/Llama-3.3-70B-Instruct-TurboYesYesNoYes$
MistralMISTRAL_API_KEYmistral-large-latestYesYesNoYes$$
xAIXAI_API_KEYgrok-2YesYesYesNo$$
OpenRouterOPENROUTER_API_KEYopenai/gpt-4oYesYesYes*Yes*Varies
OllamaOLLAMA_BASE_URLllama3.2YesPartialModel-dep.YesFree
Claude Code CLI(PATH detection)claude-sonnet-4-5-20250929YesYesYesNoFree*
Gemini CLI(PATH detection)gemini-2.5-flashYesPartial**YesNoFree*

*CLI providers use your existing subscription — $0 per token. **Gemini CLI tool calling uses XML prompt-based parsing (less reliable than native API tool calling).

Gemini CLI ToS Warning: Google's Gemini CLI ToS may prohibit third-party subprocess invocation with OAuth auth. Use gemini with API key for production. See CLI Providers for details.

*OpenRouter capabilities depend on the underlying model selected.


Quick Start

Option 1: Environment Variable (Simplest)

Set one API key and start using AgentOS:

export OPENAI_API_KEY=sk-...
import { agent } from '@framers/agentos';

const myAgent = agent({}); // Auto-detects from env (OpenAI here)
const result = await myAgent.generate('Hello, world!');
console.log(result.text);

Option 2: Programmatic

import { agent } from '@framers/agentos';

const myAgent = agent({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
});

The agent() factory is synchronous — it does not return a Promise. The first network call happens on generate() / stream() / session().send().


Auto-Detection Order

When neither provider nor model is set, AgentOS checks for API keys in this order and uses the first one found:

  1. OPENROUTER_API_KEY → OpenRouter
  2. OPENAI_API_KEY → OpenAI
  3. ANTHROPIC_API_KEY → Anthropic
  4. GEMINI_API_KEY → Google Gemini
  5. GROQ_API_KEY → Groq
  6. TOGETHER_API_KEY → Together AI
  7. MISTRAL_API_KEY → Mistral
  8. XAI_API_KEY → xAI
  9. which claude → Claude Code CLI (PATH detection — no API key, uses Max subscription)
  10. which gemini → Gemini CLI (PATH detection — no API key, uses Google account)
  11. OLLAMA_BASE_URL → Ollama

You can override auto-detection in four ways, highest priority first:

  1. Inlineagent({ provider: '...', apiKey: '...' }) on a single call.
  2. Module-level defaultsetDefaultProvider({ provider, apiKey }) once at boot. Every subsequent call inherits it; inline opts still win when supplied. Useful when credentials live in a secrets manager rather than .env.
  3. Reorder the auto-detect chainsetProviderPriority(['anthropic', 'openai', ...]) to change which env-var keys are preferred when multiple are set, without forcing a single provider. Empty array disables auto-detect entirely.
  4. CLI flag — for the Wunderland CLI, pass --provider <name>.
import { setDefaultProvider, generateText, agent } from '@framers/agentos';

setDefaultProvider({
provider: 'openai',
apiKey: process.env.MY_OWN_KEY,
// optional: model: 'gpt-4o-mini', baseUrl: '...'
});

// No env vars, no inline opts — just works:
const { text } = await generateText({ prompt: 'hello' });
const bot = agent({ instructions: '...' });

// Inline still wins:
generateText({ apiKey: 'sk-tenant-scoped', prompt: 'isolated call' });

Provider Configuration

Each provider is configured via environment variables. You can set them in your shell or .env file:

# .env

# Primary provider
OPENAI_API_KEY=sk-...

# Fallback provider
OPENROUTER_API_KEY=sk-or-...

# Local provider (no API key needed)
OLLAMA_BASE_URL=http://localhost:11434

Per-Agent Override

Individual agents pick their provider/model directly in the agent({ ... }) config:

import { agent } from '@framers/agentos';

const writer = agent({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_API_KEY, // optional override
});

Fallback Behavior

AgentOS supports automatic fallback when a provider request fails on a retryable error (HTTP 402/429/5xx, network errors). Fallback is on by default with an auto-built chain — to disable it, pass an empty array.

Primary Provider (e.g., Anthropic)
↓ fails (rate limit, timeout, error)
OpenRouter Fallback (if OPENROUTER_API_KEY is set)
↓ fails
Ollama Local Fallback (if OLLAMA_BASE_URL is set)
↓ fails
Error returned to caller

Configuring Fallback

import { agent } from '@framers/agentos';

const myAgent = agent({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
// Ordered fallback chain — each entry can override the model.
fallbackProviders: [
{ provider: 'openrouter', model: 'anthropic/claude-sonnet-4-5-20250929' },
{ provider: 'ollama', model: 'llama3.2' },
],
onFallback: (err, next) => {
console.warn(`Falling back to ${next}: ${err.message}`);
},
});

// Disable fallback entirely:
const strict = agent({ provider: 'anthropic', fallbackProviders: [] });

OpenRouter as Universal Fallback

Setting OPENROUTER_API_KEY automatically enables it as a fallback for any primary provider in the auto-built chain. OpenRouter routes to 200+ models across all major providers.

# Primary: Anthropic. Fallback: OpenRouter (automatic)
export ANTHROPIC_API_KEY=sk-ant-...
export OPENROUTER_API_KEY=sk-or-...

Cost Tiers

AgentOS tracks token usage and cost across all providers:

TierProvidersApproximate Cost (1M tokens)
$ (Budget)Groq, Together, Ollama (free)$0.00–$0.60
$$ (Standard)Gemini, Mistral, xAI, OpenRouter (varies)$0.50–$3.00
$$$ (Premium)OpenAI, Anthropic$3.00–$15.00

Cost-Aware Caps

Per-run hard cost caps live on controls:

import { agent } from '@framers/agentos';

const myAgent = agent({
provider: 'anthropic',
controls: {
maxCostUSD: 0.05, // Stop the run if total cost exceeds $0.05
maxTotalTokens: 50_000, // Stop on token cap
maxDurationMs: 30_000, // Wall-clock cap
onLimitReached: 'stop', // 'stop' | 'warn' | 'error'
},
});

For cheap-first routing across multiple models, attach a custom IModelRouter via agent({ router }) — the router decides which provider/model to call per request. See Cost Optimization for the full guide.


Provider Details

OpenAI

export OPENAI_API_KEY=sk-...
ModelContextVisionTool CallingNotes
gpt-4o128KYesYesBest all-around
gpt-4o-mini128KYesYesFast, cheap
o1200KYesYesReasoning model
o3-mini200KNoYesFast reasoning
gpt-image-1Image generation only

OAuth support: Use your ChatGPT subscription instead of an API key via the device code flow. See OAuth Auth for details.

Anthropic

export ANTHROPIC_API_KEY=sk-ant-...
ModelContextVisionTool CallingNotes
claude-opus-4-20250514200KYesYesMost capable
claude-sonnet-4-5-20250929200KYesYesBest value
claude-haiku-3-5-20241022200KYesYesFastest

Google Gemini

export GEMINI_API_KEY=AIza...
ModelContextVisionTool CallingNotes
gemini-2.5-pro1MYesYesLargest context
gemini-2.5-flash1MYesYesFast, large context
gemini-2.0-flash1MYesYesPrevious gen

Groq

export GROQ_API_KEY=gsk_...
ModelContextVisionTool CallingNotes
llama-3.3-70b-versatile128KNoYesBest Groq model
llama-3.1-8b-instant128KNoYesUltra-fast
mixtral-8x7b-3276832KNoYesMixtral on Groq

Groq provides extremely fast inference (~500 tok/s) via custom LPU hardware.

Together AI

export TOGETHER_API_KEY=...
ModelContextVisionTool CallingNotes
meta-llama/Llama-3.3-70B-Instruct-Turbo128KNoYesDefault
meta-llama/Llama-3.1-405B-Instruct-Turbo128KNoYesLargest open model
mistralai/Mixtral-8x22B-Instruct-v0.164KNoYesMixtral

Mistral AI

export MISTRAL_API_KEY=...
ModelContextVisionTool CallingNotes
mistral-large-latest128KNoYesBest Mistral model
codestral-latest32KNoYesCode-optimized
mistral-small-latest32KNoYesFast, cheap

xAI (Grok)

export XAI_API_KEY=xai-...
ModelContextVisionTool CallingNotes
grok-2128KYesYesDefault
grok-2-mini128KNoYesFaster

OpenRouter

export OPENROUTER_API_KEY=sk-or-...

OpenRouter is a multi-provider proxy that routes to 200+ models. Specify the model using the provider/model format:

import { agent } from '@framers/agentos';

const myAgent = agent({
provider: 'openrouter',
model: 'anthropic/claude-sonnet-4-5-20250929',
});

Popular OpenRouter models:

  • openai/gpt-4o
  • anthropic/claude-sonnet-4-5-20250929
  • google/gemini-2.5-flash
  • meta-llama/llama-3.3-70b-instruct

Ollama

export OLLAMA_BASE_URL=http://localhost:11434

Run any open model locally. No API key, no cost, full privacy.

# Pull models manually
ollama pull llama3.2
ollama pull codellama
ollama pull dolphin-mixtral
ModelParametersContextTool CallingNotes
llama3.23B/8B128KPartialGeneral-purpose
codellama7B/13B/34B16KNoCode-optimized
dolphin-mixtral8x7B32KNoUncensored
mistral7B32KPartialFast
phi33.8B128KNoSmall, fast

Programmatic Configuration

Provider + Model + Auth

The agent factory accepts provider, model, apiKey, and baseUrl directly. There is no separate LLMProviderConfig type — these fields live on AgentOptions (and on BaseAgentConfig, so every sub-agent in an agency() roster takes the same fields).

import { agent } from '@framers/agentos';

const myAgent = agent({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_API_KEY, // optional override
baseUrl: undefined, // optional custom base URL
});

Per-Call Overrides

generate() and stream() accept the same provider/model fields as a per-call override on top of the agent's base config — useful for sending one specific question through a different provider:

const result = await myAgent.generate(
'Run this complex analysis as a one-off.',
{
provider: 'openai',
model: 'gpt-4o',
},
);

Adding a Custom Provider

Implement the IProvider interface from @framers/agentos to add a custom LLM provider. Provider registration today is wired up via AIModelProviderManager — there is no public registerLLMProvider() shortcut yet; instead, instantiate your provider and inject it via the manager surfaced on AgentOSConfig.dependencies when constructing the runtime.

import type { IProvider } from '@framers/agentos';

class MyProvider implements IProvider {
readonly id = 'my-provider';
readonly name = 'My Custom LLM';

// ... implement generateCompletion / streamCompletion / listModels / etc.
// See packages/agentos/src/core/llm/providers/IProvider.ts for the full
// contract; the existing OpenAI / Anthropic / Ollama implementations are
// good references.
}

Look at any class under src/core/llm/providers/implementations/ for a complete reference — the OpenAI and Anthropic providers are the most fully exercised paths.


Provider Capabilities Detail

Tool Calling Support

ProviderParallel ToolsStructured OutputTool ChoiceNotes
OpenAIYesYes (strict mode)auto/none/required/specificGold standard
AnthropicYesYesauto/any/specificStrong tool use
GeminiYesYesauto/none/anyGood support
GroqYesPartialauto/noneFast but basic
TogetherYesNoauto/noneModel-dependent
MistralYesNoauto/none/anyGood support
xAIYesNoauto/noneBasic tool use
OpenRouterModel-dependentModel-dependentModel-dependentPass-through
OllamaPartialNoauto/noneModel-dependent

Embedding Support

ProviderModelsDimensionsBatch Size
OpenAItext-embedding-3-small, text-embedding-3-large256–30722048
Geminitext-embedding-0047682048
Togethertogethercomputer/m2-bert-80M-*768512
Mistralmistral-embed1024512
Ollamanomic-embed-text, mxbai-embed-large768–1024512