Health-aware routing
Use provider health and routing policy to avoid unhealthy upstreams while preserving one local gateway endpoint.
Route OpenAI-compatible LLM traffic around upstream quota, timeout, and outage signals while keeping provider keys, fallback policy, telemetry, config publishing, and rollback inside your own environment.
If this solves a real self-hosted provider failover workflow, star the repository after evaluation.
Failure modes
LLM upstreams fail through quota exhaustion, 429 bursts, slow endpoints, transient 5xx responses, model-level outages, and bad routing changes. AI Model Gateway keeps those decisions observable and reversible in the gateway layer.
Use provider health and routing policy to avoid unhealthy upstreams while preserving one local gateway endpoint.
Record route mode and provider behavior so operators can see when traffic used a fallback path.
Keep degraded providers out of rotation long enough for incidents and quota windows to clear.
Preview, diff, publish, audit, and roll back routing changes when a fallback policy behaves badly.
Executable proof
The provider fallback demo starts two fake OpenAI-compatible upstreams. The primary returns 429, the gateway serves the request through a fallback provider, rewrites the forwarded model, and records route_mode=model_fallback.
go test ./examples/provider-fallback -run TestProviderFallbackDemo -v
Open the fallback demo
Runbook shape
Fit check
Review evidence
Try the packaged v1.4.4 runtime with checksum verification, local config, runtime directories, and supervised startup commands.
Open release install pathReview CI gates, local reproduction commands, runtime smoke checks, feature proof points, and current capability boundaries.
Open quality evidenceInspect admin auth, same-origin browser writes, provider-key handling, SSRF defenses, telemetry sensitivity, and update trust.
Open security modelNext step
Start with the executable demo and operations guide. If the project matches your self-hosted LLM failover needs, starring the repository helps other operators find it.