Skip to content

Response Caching

AgentOpt caches LLM responses at the HTTP level to avoid redundant API calls during model selection.

How It Works

graph LR
    A[LLM Call] --> B{In cache?}
    B -->|Yes| C[Return cached response]
    B -->|No| D[Call API]
    D --> E[Store in memory]
    E --> F[Background flush to SQLite]
Property Detail
Cache key SHA-256 of the request body (model + messages + params), excluding stream
In-memory Thread-safe dict — always active when caching is on
On disk Optional SQLite database (cache.db), flushed every 10 seconds by a background thread

Cached responses include the original latency measurement, so cost and latency comparisons remain fair.

Why It Matters

During model selection, many LLM calls are identical:

Shared model calls

If two combinations use the same planner model, the planner call for each datapoint is identical. With 9 combinations and 3 distinct planners, you pay for 3 unique planner calls per datapoint — not 9.

Re-runs are free

Tweak your eval function and re-run? Every LLM call hits the cache. Zero API cost, instant results.

Crash recovery

If a long run is interrupted, cached responses survive on disk. Resume without re-calling the API.

Enabling Disk Cache

By default, caching is in-memory only (lost when the process exits). To persist:

from agentopt.proxy import LLMTracker

tracker = LLMTracker(cache_dir="./llm_cache")
selector = BruteForceModelSelector(
    ...,
    tracker=tracker,
)
results = selector.select_best()
# Cache automatically flushed to ./llm_cache/cache.db

On subsequent runs with the same cache_dir, entries are loaded from disk at startup.

Cache Lifecycle

Event What Happens
LLMTracker(cache_dir=...) Creates DB if needed, loads existing entries into memory
LLM call (cache miss) Response stored in memory, marked dirty
Background flush (every 10s) Dirty entries written to SQLite
tracker.stop() / select_best() returns Final flush to disk
tracker.clear_cache() Clears memory and deletes all DB rows

Disabling Cache

tracker = LLMTracker(cache=False)

Inspecting the Cache

The cache is a standard SQLite database:

sqlite3 ./llm_cache/cache.db "SELECT COUNT(*) FROM cache"
sqlite3 ./llm_cache/cache.db "SELECT DISTINCT json_extract(value, '$.body.model') FROM cache"
ls -lh ./llm_cache/cache.db