llm-gateway
One endpoint. Every model.
llm-gateway is the LLM proxy that fronts every model call in basecamp. Uniform request shape (OpenAI-compatible), per-caller rate limiting, per-model cost tracking, and pluggable routing to any provider or a local vLLM instance.
What you can do
OpenAI-compatible interface
Any OpenAI SDK client works. No provider-specific SDKs needed.
Multi-provider routing
Anthropic, OpenAI, Cohere, local vLLM — pick per request or per caller.
Rate limits per caller
Prevent one runaway agent from burning through the org's daily budget.
Cost tracking
Every token attributed. Slack alert when a caller crosses a threshold.
Streaming everywhere
Server-sent events end-to-end. No buffering, no head-of-line blocking.
Prompt caching
Anthropic-style prompt caching passed through for the providers that support it.
Explore the rest of /root
The 9-tier K8s-native platform
Reproducible experiments
Observability & tracing
On-call alerting
Cloud infrastructure as code
Platform control CLI
Lakehouse & data platform
MCP tool server plans
AI-augmented operations
Weekly logs + runbooks
Get started with llm-gateway
Clone the repo, read the plan, and start building your own version.
All projects