Skip to content

Client-side agent optimization

Choosing the right model for your agents can save 100x money and time

AgentOpt evaluates model combinations across your full agent pipeline and converges on the Pareto frontier of accuracy, cost, and latency.

Exponential
combo space
Smart search
on best model combination
0
code changes
any
agent framework

The Problem

Your agent has multiple steps — planning, reasoning, tool use, synthesis. Each step could use a different model. With 5 candidate models across 3 steps, that's 125 combinations. Testing them manually is impractical. Picking blindly leaves performance (and money) on the table.

The Solution

Give AgentOpt your agent and a small evaluation dataset (~100 samples). It efficiently searches the model combination space and reports the Pareto-optimal tradeoffs — so you can choose the right balance of accuracy, cost, and latency for your use case.

from agentopt import BruteForceModelSelector

selector = BruteForceModelSelector(
    agent=MyAgent,
    models={
        "planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
        "solver":  ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
    },
    eval_fn=eval_fn,
    dataset=dataset,
)

results = selector.select_best(parallel=True)
results.print_summary()
    Model Selection Results
    --------------------------------------------------------------------------
    Rank  Model                                       Accuracy  Latency    Price
    --------------------------------------------------------------------------
>>>  1  planner=gpt-4.1-nano + solver=gpt-4.1-nano   100.00%    0.85s  $0.000420
     2  planner=gpt-4o-mini + solver=gpt-4o-mini      100.00%    1.20s  $0.002372
     3  planner=gpt-4o + solver=gpt-4o                 100.00%    2.70s  $0.014355
    ...

Why AgentOpt

  • Non-Intrusive


    Define your agent as a class. No framework adapters, no SDK wrappers, no code changes to your agent internals.

  • Framework-Agnostic


    Works with OpenAI, LangChain, LangGraph, CrewAI, LlamaIndex, AG2 — any framework that calls LLMs over HTTP.

  • Smart Search


    6 algorithms from brute force to Bayesian optimization. Search spaces with thousands of combinations without evaluating them all.

  • Automatic Tracking


    Transparently intercepts all LLM calls to measure tokens, latency, and cost. No manual instrumentation needed.

  • Response Caching


    Identical LLM calls are cached in-memory and on disk (SQLite). Re-running experiments is instant and free.

  • Parallel Evaluation


    Evaluate model combinations concurrently with configurable concurrency limits. Get results faster.


How It Works

graph LR
    A["Your Agent"] --> B["httpx layer"]
    B --> C["LLM API"]
    B --> D["AgentOpt Interceptor"]
    D --> E["Track tokens, latency, cost"]
    D --> F["Cache responses"]
    E --> G["Evaluate & Rank"]
    G --> H["Pareto-optimal results"]

AgentOpt patches httpx at the transport level — the same HTTP library used by every major LLM SDK. Your agent code stays untouched. AgentOpt silently records every LLM call, caches responses, and aggregates metrics per model combination.

Learn more about the architecture


Selection Algorithms

Algorithm Strategy Best For
Brute Force Evaluate all combinations Small spaces (< 50 combos)
Random Search Random sampling Quick baselines
Hill Climbing Greedy + restarts Medium spaces with model topology
Arm Elimination Progressive pruning Statistical early stopping
LM Proposal LLM-guided shortlist Leveraging model knowledge
Bayesian Optimization Gaussian Process Expensive evaluations

Compare algorithms in detail


Get Started