Skip to main content

4 posts tagged with "stability"

View All Tags

Incident Report: Cache Eviction Closes In-Use httpx Clients

Ryan Crabbe
Performance Engineer, LiteLLM
Ishaan Jaff
CTO, LiteLLM
Krrish Dholakia
CEO, LiteLLM

Date: February 27, 2026 Duration: ~6 days (Feb 21 merge -> Feb 27 fix) Severity: High Status: Resolved

Note: This fix is available starting from LiteLLM v1.81.14.rc.2 or higher.

Summary​

A change to improve Redis connection pool cleanup introduced a regression that closed httpx clients that were still actively being used by the proxy. The LLMClientCache (an in-memory TTL cache) stores both Redis clients and httpx clients under the same eviction policy. When a cache entry expired or was evicted, the new cleanup code called aclose()/close() on the evicted value which worked correctly for Redis clients, but destroyed httpx clients that other parts of the system still held references to and were actively using for LLM API calls.

Impact: Any proxy instance that hit the cache TTL (default 10 minutes) or capacity limit (200 entries) would have its httpx clients closed out from under it, causing requests to LLM providers to fail with connection errors.

Incident Report: SERVER_ROOT_PATH regression broke UI routing

Yuneng Jiang
SWE @ LiteLLM (Full Stack)
Ishaan Jaff
CTO, LiteLLM
Krrish Dholakia
CEO, LiteLLM

Date: January 22, 2026 Duration: ~4 days (until fix merged January 26, 2026) Severity: High Status: Resolved

Note: This fix is available starting from LiteLLM v1.81.3.rc.6 or higher.

Summary​

A PR (#19467) accidentally removed the root_path=server_root_path parameter from the FastAPI app initialization in proxy_server.py. This caused the proxy to ignore the SERVER_ROOT_PATH environment variable when serving the UI. Users who deploy LiteLLM behind a reverse proxy with a path prefix (e.g., /api/v1 or /llmproxy) found that all UI pages returned 404 Not Found.

  • LLM API calls: No impact. API routing was unaffected.
  • UI pages: All UI pages returned 404 for deployments using SERVER_ROOT_PATH.
  • Swagger/OpenAPI docs: Broken when accessed through the configured root path.

Incident Report: Invalid beta headers with Claude Code

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Ishaan Jaff
CTO, LiteLLM
Krrish Dholakia
CEO, LiteLLM

Date: February 13, 2026 Duration: ~3 hours Severity: High Status: Resolved

Note: This fix will be available starting from v1.81.13-nightly or higher of LiteLLM.

Summary​

Claude Code began sending unsupported Anthropic beta headers to non-Anthropic providers (Bedrock, Azure AI, Vertex AI), causing invalid beta flag errors. LiteLLM was forwarding all beta headers without provider-specific validation. Users experienced request failures when routing Claude Code requests through LiteLLM to these providers.

  • LLM calls to Anthropic: No impact.
  • LLM calls to Bedrock/Azure/Vertex: Failed with invalid beta flag errors when unsupported headers were present.
  • Cost tracking and routing: No impact.

Incident Report: Invalid model cost map on main

Ishaan Jaffer
CTO, LiteLLM

Date: January 27, 2026 Duration: ~20 minutes Severity: Low Status: Resolved

Summary​

A malformed JSON entry in model_prices_and_context_window.json was merged to main (562f0a0). This caused LiteLLM to silently fall back to a stale local copy of the model cost map. Users on older package versions lost cost tracking for newer models only (e.g. azure/gpt-5.2). No LLM calls were blocked.

  • LLM calls and proxy routing: No impact.
  • Cost tracking: Impacted for newer models not present in the local backup. Older models were unaffected. The incident lasted ~20 minutes until the commit was reverted.