Cost Optimization
Comprehensive guide for optimizing LLM costs, configuring performance tiers, and implementing sensible defaults across the AgentOS framework.
Table of Contents
- Overview
- Cost Factors
- Optimization Strategies
- Performance Tiers
- Model Selection
- RAG Cost Optimization
- Storage Cost Optimization
- Configuration Reference
- Monitoring & Budgets
Overview
AgentOS is designed to be cost-conscious by default while allowing fine-grained control for users who need it. This guide covers:
- LLM Costs: Token usage, model selection, caching
- RAG Costs: Embedding generation, vector storage, retrieval
- Storage Costs: Database operations, sync bandwidth
- Compute Costs: Tool execution, streaming overhead
Key Principles
- Sensible Defaults: Out-of-box configuration minimizes cost while maintaining quality
- Configurable Tradeoffs: Choose between speed, cost, and accuracy
- Transparency: Built-in metrics for cost tracking
- Graceful Degradation: Falls back to cheaper options when possible
Cost Factors
LLM Token Costs (Approximate)
| Model | Input (per 1K) | Output (per 1K) | Context Window |
|---|---|---|---|
| GPT-4o | $0.005 | $0.015 | 128K |
| GPT-4o-mini | $0.00015 | $0.0006 | 128K |
| Claude 3.5 Sonnet | $0.003 | $0.015 | 200K |
| Claude 3 Haiku | $0.00025 | $0.00125 | 200K |
| Gemini 1.5 Pro | $0.00125 | $0.005 | 1M |
| Gemini 1.5 Flash | $0.000075 | $0.0003 | 1M |