Parallel Evaluation¶

When you call select_best(parallel=True), AgentOpt evaluates model combinations concurrently. The max_concurrent parameter controls the total number of in-flight API calls across all combinations and datapoints.

The Problem¶

A naive approach would fire all combinations and all their datapoints at once. With 9 combinations and 100 datapoints each, that's 900 concurrent API calls — enough to hit rate limits and get throttled.

Two-Level Concurrency¶

AgentOpt splits max_concurrent into two tiers:

graph TD
    subgraph "max_concurrent = 20"
        subgraph "Combo Semaphore (n_combo = 4)"
            C1["Combo 1"] --> D1["dp 1..5"]
            C2["Combo 2"] --> D2["dp 1..5"]
            C3["Combo 3"] --> D3["dp 1..5"]
            C4["Combo 4"] --> D4["dp 1..5"]
        end
        C5["Combo 5..N"] -.->|"waiting"| C1
    end

Tier	Controls	Semaphore
Outer (combo)	How many combos run simultaneously	`asyncio.Semaphore(n_combo)`
Inner (datapoint)	How many datapoints run per combo	`asyncio.Semaphore(dp_concurrent)`

The invariant: n_combo * dp_concurrent <= max_concurrent.

How Slots Are Allocated¶

The algorithm prioritizes datapoint parallelism — it's better to finish one combo quickly than to start many combos slowly:

dp_concurrent = min(max_concurrent, batch_size)
n_combo       = max_concurrent // dp_concurrent

`max_concurrent`	`batch_size`	`n_combo`	`dp_concurrent`	Behavior
20	5	4	5	4 combos, each with 5 dp in parallel
20	100	1	20	1 combo at a time, 20 dp in parallel
20	1	20	1	20 combos, 1 dp each (bandit algorithms)
10	10	1	10	All slots to a single combo's datapoints

Bandit algorithms

Bandit-style selectors (Arm Elimination, Threshold SE, Epsilon-LUCB) often evaluate one datapoint at a time per round (batch_size=1). In this case, all max_concurrent slots go to running combos in parallel — which is exactly what you want for round-by-round elimination.

Per-Algorithm Behavior¶

Each selector recomputes concurrency limits at the appropriate granularity:

Selector	When recomputed	Notes
Brute Force	Once	Full dataset, fixed batch size
Random Search	Once	Same as brute force on sampled combos
Hill Climbing	Per iteration	Recomputed for each neighbor batch
Arm Elimination	Per round	Batch size grows each round
Threshold SE	Init + per round	Init batch, then batch_size=1
Epsilon-LUCB	Init + per round	Same pattern as Threshold SE
Bayesian Optimization	Per BO batch	Recomputed for each acquisition batch

Usage¶

# Default: 20 total concurrent API calls
results = selector.select_best(parallel=True)

# Increase for high rate limits
results = selector.select_best(parallel=True, max_concurrent=50)

# Conservative: avoid rate limits
results = selector.select_best(parallel=True, max_concurrent=5)

Choosing max_concurrent

Start with the default (20). If you hit rate limits, lower it. If you have high rate limits (e.g., tier 4+ OpenAI), increase it. The two-level split ensures you won't accidentally fire combos * datapoints calls simultaneously regardless of the value.