Client-side agent optimization
Choosing the right model for your agents can save 100x money and time¶
AgentOpt evaluates model combinations across your full agent pipeline and converges on the Pareto frontier of accuracy, cost, and latency.
The Problem¶
Your agent has multiple steps — planning, reasoning, tool use, synthesis. Each step could use a different model. With 5 candidate models across 3 steps, that's 125 combinations. Testing them manually is impractical. Picking blindly leaves performance (and money) on the table.
The Solution¶
Give AgentOpt your agent and a small evaluation dataset (~100 samples). It efficiently searches the model combination space and reports the Pareto-optimal tradeoffs — so you can choose the right balance of accuracy, cost, and latency for your use case.
from agentopt import BruteForceModelSelector
selector = BruteForceModelSelector(
agent=MyAgent,
models={
"planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
"solver": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
},
eval_fn=eval_fn,
dataset=dataset,
)
results = selector.select_best(parallel=True)
results.print_summary()
Model Selection Results
--------------------------------------------------------------------------
Rank Model Accuracy Latency Price
--------------------------------------------------------------------------
>>> 1 planner=gpt-4.1-nano + solver=gpt-4.1-nano 100.00% 0.85s $0.000420
2 planner=gpt-4o-mini + solver=gpt-4o-mini 100.00% 1.20s $0.002372
3 planner=gpt-4o + solver=gpt-4o 100.00% 2.70s $0.014355
...
Why AgentOpt¶
-
Non-Intrusive
Define your agent as a class. No framework adapters, no SDK wrappers, no code changes to your agent internals.
-
Framework-Agnostic
Works with OpenAI, LangChain, LangGraph, CrewAI, LlamaIndex, AG2 — any framework that calls LLMs over HTTP.
-
Smart Search
6 algorithms from brute force to Bayesian optimization. Search spaces with thousands of combinations without evaluating them all.
-
Automatic Tracking
Transparently intercepts all LLM calls to measure tokens, latency, and cost. No manual instrumentation needed.
-
Response Caching
Identical LLM calls are cached in-memory and on disk (SQLite). Re-running experiments is instant and free.
-
Parallel Evaluation
Evaluate model combinations concurrently with configurable concurrency limits. Get results faster.
How It Works¶
graph LR
A["Your Agent"] --> B["httpx layer"]
B --> C["LLM API"]
B --> D["AgentOpt Interceptor"]
D --> E["Track tokens, latency, cost"]
D --> F["Cache responses"]
E --> G["Evaluate & Rank"]
G --> H["Pareto-optimal results"]
AgentOpt patches httpx at the transport level — the same HTTP library used by every major LLM SDK. Your agent code stays untouched. AgentOpt silently records every LLM call, caches responses, and aggregates metrics per model combination.
Learn more about the architecture
Selection Algorithms¶
| Algorithm | Strategy | Best For |
|---|---|---|
| Brute Force | Evaluate all combinations | Small spaces (< 50 combos) |
| Random Search | Random sampling | Quick baselines |
| Hill Climbing | Greedy + restarts | Medium spaces with model topology |
| Arm Elimination | Progressive pruning | Statistical early stopping |
| LM Proposal | LLM-guided shortlist | Leveraging model knowledge |
| Bayesian Optimization | Gaussian Process | Expensive evaluations |
Get Started¶
-
Install
-
Quick Start
Build and optimize your first agent in 5 minutes.
-
Examples
Framework-specific examples for OpenAI, LangChain, CrewAI, and LlamaIndex.
-
API Reference
Full reference for selectors, results, and the tracker.