Skip to content

Selection Algorithms

AgentOpt provides 8 selection algorithms. Choose based on your search space size and evaluation budget.

At a Glance

Algorithm Strategy Evaluations Best For
Brute Force Exhaustive All Small spaces (< 50 combos)
Random Search Sampling Configurable fraction Quick baselines
Hill Climbing Greedy + restarts Guided neighbors Medium spaces
Arm Elimination Progressive pruning Adaptive Statistical early stopping
Epsilon LUCB ε-optimal LUCB Adaptive Cost savings when ε-optimal is enough
Threshold SE Threshold classification Adaptive Filtering above/below a performance target
LM Proposal LLM-guided Shortlist Leveraging model knowledge
Bayesian Optimization GP surrogate Sequential Expensive evaluations

Common interface

All selectors share the same constructor and select_best() method. Switching algorithms is a one-line change.

selector = AnySelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
)
results = selector.select_best(parallel=True, max_concurrent=20)

Brute Force

Evaluates every combination in the Cartesian product.

from agentopt import BruteForceModelSelector

selector = BruteForceModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
)

When to use

Small search spaces where you can afford to evaluate everything. Guarantees finding the true optimum.

Complexity

Evaluations grow as the product of model list sizes. 5 models x 3 nodes = 125 combinations.


Samples a random fraction of all combinations.

from agentopt import RandomSearchModelSelector

selector = RandomSearchModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    sample_fraction=0.25,  # evaluate 25% of combinations
    seed=42,
)
Parameter Default Description
sample_fraction 0.25 Fraction of combinations to evaluate
seed None Random seed for reproducibility

When to use

Quick exploration to establish a baseline before committing to a thorough search.


Hill Climbing

Greedy local search with random restarts. Defines "neighbors" using model quality and speed rankings, so each step is an informed single-model swap.

from agentopt import HillClimbingModelSelector

selector = HillClimbingModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    max_iterations=20,
    num_restarts=3,
    patience=3,
)
Parameter Default Description
max_iterations 20 Max steps per restart
num_restarts 3 Number of random restarts
patience 3 Steps without improvement before restart

When to use

Medium-sized spaces where you want to exploit model topology — cheaper models are neighbors of expensive ones.


Arm Elimination

Progressively eliminates statistically dominated combinations. Starts with a small batch of datapoints, then grows the batch while eliminating underperformers.

from agentopt import ArmEliminationModelSelector

selector = ArmEliminationModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    growth_factor=2.0,
    confidence=1.0,
)
Parameter Default Description
n_initial None Initial batch size. Default: 10% of dataset (max(1, len(dataset)//10))
growth_factor 2.0 Batch size multiplier per round
confidence 1.0 Elimination confidence threshold

When to use

When bad combinations should be eliminated early to save budget. Particularly effective when there are clearly weak options.


Epsilon LUCB

Identifies an ε-optimal best arm using Lower and Upper Confidence Bounds. Each round, it compares the current leader's lower confidence bound against the best challenger's upper bound. When the gap closes below epsilon, the algorithm stops with statistical confidence that the selected arm is within epsilon of optimal.

from agentopt import EpsilonLUCBModelSelector

selector = EpsilonLUCBModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    epsilon=0.01,
    confidence=1.0,
)
Parameter Default Description
epsilon 0.01 Acceptable gap from the true best
n_initial 1 Initial datapoints per combination
confidence 1.0 Confidence level for bound computation

When to use

When finding the exact best combo isn't necessary and you can tolerate a small accuracy gap (epsilon) in exchange for significant cost savings. Particularly effective when many combos are close in performance.


Threshold Successive Elimination

Instead of finding the single best combination, Threshold SE classifies each combination as above or below a user-defined performance threshold. Each round, it evaluates all surviving combos on one more datapoint and checks their confidence intervals. Once a combo's interval no longer straddles the threshold (entirely above or entirely below), it's classified and removed from the active set.

from agentopt import ThresholdBanditSEModelSelector

selector = ThresholdBanditSEModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    threshold=0.75,
    confidence=1.0,
)
Parameter Default Description
threshold 0.75 Performance threshold to classify against
confidence 1.0 Confidence level for bound computation

When to use

When you have a minimum acceptable accuracy in mind (e.g., "I need at least 75%") and want to quickly identify which combinations meet it. Useful for filtering rather than ranking.


LM Proposal

Uses a proposer LLM to shortlist promising combinations before evaluation. The proposer sees the candidate models and a dataset preview, then suggests which combinations to try.

from agentopt import LMProposalModelSelector

selector = LMProposalModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    proposer_model="gpt-4.1",
    objective="maximize accuracy and then minimize latency and cost",
    dataset_preview_size=10,
)
Parameter Default Description
proposer_model "gpt-4.1" Model used for proposal generation
proposer_client None Custom OpenAI-compatible client; auto-creates OpenAI() if omitted
objective "maximize accuracy and then minimize latency and cost" Natural-language objective passed to the proposer
dataset_preview_size 10 Number of dataset examples shown to the proposer

When to use

When you want to leverage an LLM's knowledge about model capabilities to skip obviously bad combinations.


Bayesian Optimization

Uses a Gaussian Process surrogate to predict accuracy for unevaluated combinations, then selects the most promising one via Expected Improvement.

from agentopt import BayesianOptimizationModelSelector

selector = BayesianOptimizationModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    batch_size=1,
    sample_fraction=0.25,
)
Parameter Default Description
batch_size 1 Combinations to evaluate per GP iteration
sample_fraction 0.25 Fraction of dataset to use per evaluation

Extra dependency

Requires PyTorch and BoTorch:

pip install "agentopt-py[bayesian]"

When to use

When each evaluation is expensive (large dataset, slow models) and you want to minimize total evaluations. The GP learns from past results to pick the most informative next combination.