Skip to content

Results

SelectionResults

Returned by selector.select_best(). Contains all evaluation results.

Method Returns Description
print_summary() None Print formatted table with rank, accuracy, latency, price
get_best() ModelResult Highest accuracy result (ties broken by latency)
get_best_combo() Dict[str, str] Best combination as {"node": "model_name"}
to_csv(path) None Export all results to CSV
export_config(path) None Export best combination as YAML config

Example

results = selector.select_best(parallel=True)

# Print ranked table
results.print_summary()

# Access the winner
best = results.get_best()
print(f"Best: {best.model_name}, accuracy={best.accuracy:.1%}")

# Export
results.to_csv("all_results.csv")
results.export_config("optimized_config.yaml")

ModelResult

Each evaluated combination produces a ModelResult:

Field Type Description
model_name str Combination label (e.g., "planner=gpt-4o + solver=gpt-4o-mini")
accuracy float Mean eval score across all datapoints
latency_seconds float Mean latency per datapoint
input_tokens Dict[str, int] Input tokens by model
output_tokens Dict[str, int] Output tokens by model
price float (property) Per-sample cost in USD, or None if pricing unavailable
is_best bool Whether this is the top-ranked combination
datapoint_results List[DatapointResult] Per-datapoint breakdown

DatapointResult

Per-datapoint evaluation detail:

Field Type Description
datapoint_index int Index in the dataset
score float Eval score for this datapoint
latency_seconds float Latency for this datapoint
input_tokens Dict[str, int] Input tokens by model
output_tokens Dict[str, int] Output tokens by model