Selectors¶

A ModelSelector picks the best model combination for an agent against a dataset: instantiate the agent with each candidate dict, evaluate, rank by eval_fn. All selectors share one constructor surface and one entry point — select_best() — differing only in their search algorithm.

from agentopt import ModelSelector

selector = ModelSelector(
    agent=MyAgent,
    models={"planner": ["gpt-4o", "gpt-4o-mini"], "solver": ["gpt-4o-mini"]},
    eval_fn=lambda expected, actual: float(actual == expected),
    dataset=[(inp, expected), ...],
    method="auto",                # arm_elimination — strong + cheap
)
results = selector.select_best(parallel=True, max_concurrent=20)
results.print_summary()

Common parameters¶

Parameter	Type	Description
`agent`	`type`	Agent class with `__init__(self, models)` and `run(self, input_data)`. Duck-typed — no base class required.
`models`	`Dict[str, List]`	Maps node names to candidate model lists (e.g. `{"planner": ["gpt-4o", "gpt-4o-mini"]}`).
`eval_fn`	`Callable`	`(expected, actual) -> float` score (higher is better).
`dataset`	`Sequence[Tuple]`	`[(input_data, expected_answer), ...]`.
`model_prices`	`Dict`, optional	Custom pricing overrides: `{"model": {"input_price": x, "output_price": y}}` in $/MTok. Required for cost terms when `lambda_cost > 0`.
`lambda_cost`	`float`, optional	Weight on normalized per-sample cost in the combined objective. Default `0.0` (disabled). See Combined objective below.
`lambda_latency`	`float`, optional	Weight on normalized per-sample latency in the combined objective. Default `0.0` (disabled).
`node_descriptions`	`Dict[str, str]`, optional	Human-readable descriptions per node — surfaced in `LMProposalModelSelector`.
`tracker`	`LLMTracker`, optional	Bring your own. Defaults to a fresh `LLMTracker()` started in the constructor. Pass one in to share a cache across runs, route via a daemon (`AGENTOPT_GATEWAY_URL`), or post-process records after `select_best()` returns.

The selector calls tracker.start() in the constructor and tracker.stop() when select_best() returns or raises. Record queries on the tracker remain valid after stop(), so post-run analysis works:

tracker = LLMTracker(cache_dir="./shared_cache")
selector = ModelSelector(..., tracker=tracker)
selector.select_best()
print(tracker.get_usage())          # tracker.stop() already called; records still here

See tracker.md for the full tracker surface.

Combined objective (optional cost/latency weights)¶

By default, selectors optimize eval_fn score only (typically accuracy) and break ties with latency, then price. To trade accuracy against cost and latency in one scalar reward, pass optional weights on the constructor (or via ModelSelector(..., **kwargs)):

Parameter	Default	Effect
`lambda_cost`	`0.0`	Penalizes normalized per-sample token cost (USD from the tracker, or `model_prices`).
`lambda_latency`	`0.0`	Penalizes normalized per-sample wall-clock latency (seconds).

Omit both parameters (or leave them at 0.0) for the original accuracy-centric behavior. Set one or both when you want multi-metric selection.

Formula¶

For each datapoint, after observations are recorded:

combined = score
         - lambda_cost    * norm(cost)
         - lambda_latency * norm(latency)

score — return value of eval_fn (higher is better).
norm(·) — min–max scale to [0, 1] using running min/max over all samples seen during that selector run (updated as more combos are evaluated).
Per combination — mean of per-datapoint combined values → ModelResult.combined_objective (see results.md).

This is a linear scalarization, not Pareto exploration. Larger lambda_* penalize cost/latency more strongly relative to score.

Example¶

selector = ModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    method="matrix_ucb",
    lambda_cost=0.3,      # optional — omit for accuracy-only
    lambda_latency=0.2,
    model_prices={        # recommended when lambda_cost > 0
        "gpt-4o": {"input_price": 2.5, "output_price": 10.0},
        "gpt-4o-mini": {"input_price": 0.15, "output_price": 0.6},
    },
)
results = selector.select_best(parallel=True)
results.print_summary()   # ranks by combined_objective when lambdas are set

How each method uses the weights¶

Methods	During search	Final `is_best`
`matrix_ucb`, `matrix_ucb_lrf`	UCB rewards use per-cell combined objective	`_find_best` on `combined_objective`
`arm_elimination`, `epsilon_lucb`, `threshold`	Elimination / LUCB stats on combined per-sample objectives	same
`hill_climbing`, `bayesian`	Move / surrogate target uses combined objective	same
`brute_force`, `random`	Does not steer which combos to try	same
`lm_proposal`	Proposer uses `objective=` text, not these lambdas	`combined_objective` on the one evaluated combo only

After select_best(), a final pass recomputes every result’s combined_objective against the full-run normalizer so rankings are comparable.

lm_proposal vs lambdas

LMProposalModelSelector(objective="...") is a natural-language hint to the proposer LLM. It is separate from lambda_cost / lambda_latency, which only affect the scalar reward used for ranking and bandit methods.

`select_best()`¶

results = selector.select_best(
    parallel=False,        # If True, evaluate combos concurrently with asyncio
    max_concurrent=20,     # Total concurrent API-call budget across all combos
)

Returns a SelectionResults. parallel=True requires agent.run to be either async or threadsafe; the selector splits max_concurrent between outer (combos) and inner (datapoints) loops based on dataset size.

Automatic cleanup

select_best() calls tracker.stop() on return or exception — caches flush to disk, masters tear down. The tracker remains queryable; only close() (which select_best does not call) drops the remote backend's HTTP client.

Choosing a method¶

`method`	Algorithm	When to use
`"auto"` (default)	Arm elimination	Strong best-arm identification at lower search cost than brute force. Same impl as `"arm_elimination"`.
`"brute_force"`	Evaluate every combo on the full dataset	Small search space; ground-truth comparison.
`"random"`	Random search	Cheap baseline.
`"hill_climbing"`	Greedy per-node	Large combinatorial spaces with weak coupling between nodes.
`"arm_elimination"`	Successive elimination	Best-arm identification with PAC-style guarantees.
`"epsilon_lucb"`	LUCB with tolerance	Stop once a combo is within ε of the best.
`"matrix_ucb"` / `"matrix_ucb_lrf"`	UCB exploiting cross-combo structure	Large model x datapoint matrices; `lrf` adds low-rank factorization.
`"threshold"`	Threshold bandit successive elimination	"Find all combos above accuracy θ" rather than the single best.
`"lm_proposal"`	LM-guided	Uses `node_descriptions` to propose combinations.
`"bayesian"`	Bayesian optimization	Optional extra: `pip install "agentopt-py[bayesian]"`.

Selector Classes¶

`agentopt.model_selection.brute_force.BruteForceModelSelector` ¶

Selects the best model combination by evaluating all combinations.

Supports sequential and async-parallel evaluation via select_best().

`agentopt.model_selection.random_search.RandomSearchModelSelector` ¶

Selects the best model combination from a random subset of candidates.

`agentopt.model_selection.hill_climbing.HillClimbingModelSelector` ¶

Select models via stochastic hill climbing with random restarts.

`agentopt.model_selection.arm_elimination.ArmEliminationModelSelector` ¶

Select models via successive arm elimination.

`agentopt.model_selection.epsilon_lucb.EpsilonLUCBModelSelector` ¶

Select models via epsilon-optimal LUCB.

`agentopt.model_selection.matrix_ucb.MatrixUCBModelSelector` ¶

UCB on the full combination × datapoint matrix (row means + exploration bonus).

Selection always proceeds in batches of matrix cells; only select_best(..., max_concurrent=...) matters (parallel is ignored).

observation_budget_fraction or, equivalently, sample_fraction (same meaning as in :class:RandomSearchModelSelector / Bayesian: fraction of the search budget — here, fraction of matrix cells to observe) caps evaluations. 1.0 fills the full grid; 0.1 stops after about 10% of cells. If both are passed, sample_fraction wins.

`agentopt.model_selection.matrix_ucb.MatrixUCBLRFModelSelector` ¶

Matrix UCB with low-rank factorization uncertainty (ensemble ALS), per banditeval.

Warmup and main-phase cell batches use select_best(..., max_concurrent=...). Cell budget: observation_budget_fraction or sample_fraction (see :class:MatrixUCBModelSelector). Warmup threshold: warmup_percentage or warmup_fraction — random probes until this fraction of the full grid is observed, then LRF+UCB (banditeval-style).

`agentopt.model_selection.threshold_successive_elimination.ThresholdBanditSEModelSelector` ¶

Select models via threshold-based successive elimination.

`agentopt.model_selection.lm_proposal.LMProposalModelSelector` ¶

Model selector where an LLM proposes the single best combination.

`agentopt.model_selection.bayesian_optimization.BayesianOptimizationModelSelector` ¶

Select models via Bayesian optimization.

Selectors¶

Common parameters¶

Combined objective (optional cost/latency weights)¶

Formula¶

Example¶

How each method uses the weights¶

select_best()¶

Choosing a method¶

Selector Classes¶

agentopt.model_selection.brute_force.BruteForceModelSelector ¶

agentopt.model_selection.random_search.RandomSearchModelSelector ¶

agentopt.model_selection.hill_climbing.HillClimbingModelSelector ¶

agentopt.model_selection.arm_elimination.ArmEliminationModelSelector ¶

agentopt.model_selection.epsilon_lucb.EpsilonLUCBModelSelector ¶

agentopt.model_selection.matrix_ucb.MatrixUCBModelSelector ¶

agentopt.model_selection.matrix_ucb.MatrixUCBLRFModelSelector ¶

agentopt.model_selection.threshold_successive_elimination.ThresholdBanditSEModelSelector ¶

agentopt.model_selection.lm_proposal.LMProposalModelSelector ¶

agentopt.model_selection.bayesian_optimization.BayesianOptimizationModelSelector ¶