Client-side agent optimization

Choosing the right model for your agents can save 100x money and time¶

AgentOpt evaluates model combinations across your full agent pipeline and converges on the Pareto frontier of accuracy, cost, and latency.

Get started View on GitHub

Exponential

combo space

Smart search

on best model combination

0

code changes

any

agent framework

The Problem¶

Your agent has multiple steps — planning, reasoning, tool use, synthesis. Each step could use a different model. With 5 candidate models across 3 steps, that's 125 combinations. Testing them manually is impractical. Picking blindly leaves performance (and money) on the table.

The Solution¶

Give AgentOpt your agent and a small evaluation dataset (~100 samples). It efficiently searches the model combination space and reports the Pareto-optimal tradeoffs — so you can choose the right balance of accuracy, cost, and latency for your use case.

from agentopt import BruteForceModelSelector

selector = BruteForceModelSelector(
    agent=MyAgent,
    models={
        "planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
        "solver":  ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
    },
    eval_fn=eval_fn,
    dataset=dataset,
)

results = selector.select_best(parallel=True)
results.print_summary()

    Model Selection Results
    --------------------------------------------------------------------------
    Rank  Model                                       Accuracy  Latency    Price
    --------------------------------------------------------------------------
>>>  1  planner=gpt-4.1-nano + solver=gpt-4.1-nano   100.00%    0.85s  $0.000420
     2  planner=gpt-4o-mini + solver=gpt-4o-mini      100.00%    1.20s  $0.002372
     3  planner=gpt-4o + solver=gpt-4o                 100.00%    2.70s  $0.014355
    ...

Why AgentOpt¶

Non-Intrusive

Define your agent as a class. No framework adapters, no SDK wrappers, no code changes to your agent internals.
Framework-Agnostic

Works with OpenAI, LangChain, LangGraph, CrewAI, LlamaIndex, AG2 — any framework that calls LLMs over HTTP.
Smart Search

6 algorithms from brute force to Bayesian optimization. Search spaces with thousands of combinations without evaluating them all.
Automatic Tracking

Transparently intercepts all LLM calls to measure tokens, latency, and cost. No manual instrumentation needed.
Response Caching

Identical LLM calls are cached in-memory and on disk (SQLite). Re-running experiments is instant and free.
Parallel Evaluation

Evaluate model combinations concurrently with configurable concurrency limits. Get results faster.

How It Works¶

graph LR
    A["Your Agent"] --> B["httpx layer"]
    B --> C["LLM API"]
    B --> D["AgentOpt Interceptor"]
    D --> E["Track tokens, latency, cost"]
    D --> F["Cache responses"]
    E --> G["Evaluate & Rank"]
    G --> H["Pareto-optimal results"]

AgentOpt patches httpx at the transport level — the same HTTP library used by every major LLM SDK. Your agent code stays untouched. AgentOpt silently records every LLM call, caches responses, and aggregates metrics per model combination.

Learn more about the architecture

Selection Algorithms¶

Algorithm	Strategy	Best For
Brute Force	Evaluate all combinations	Small spaces (< 50 combos)
Random Search	Random sampling	Quick baselines
Hill Climbing	Greedy + restarts	Medium spaces with model topology
Arm Elimination	Progressive pruning	Statistical early stopping
LM Proposal	LLM-guided shortlist	Leveraging model knowledge
Bayesian Optimization	Gaussian Process	Expensive evaluations

Compare algorithms in detail

Get Started¶

Install
```
pip install agentopt-py
```
Installation guide
Quick Start

Build and optimize your first agent in 5 minutes.

Quick start tutorial
Examples

Framework-specific examples for OpenAI, LangChain, CrewAI, and LlamaIndex.

Browse examples
API Reference

Full reference for selectors, results, and the tracker.

API docs