LLM Routing

llm-gateway

One endpoint. Every model.

llm-gateway is the LLM proxy that fronts every model call in basecamp. Uniform request shape (OpenAI-compatible), per-caller rate limiting, per-model cost tracking, and pluggable routing to any provider or a local vLLM instance.

View on GitHub Read the docs

Capabilities

What you can do

OpenAI-compatible interface

Any OpenAI SDK client works. No provider-specific SDKs needed.

Multi-provider routing

Anthropic, OpenAI, Cohere, local vLLM — pick per request or per caller.

Rate limits per caller

Prevent one runaway agent from burning through the org's daily budget.

Cost tracking

Every token attributed. Slack alert when a caller crosses a threshold.

Streaming everywhere

Server-sent events end-to-end. No buffering, no head-of-line blocking.

Prompt caching

Anthropic-style prompt caching passed through for the providers that support it.

Explore the rest of /root

basecamp

The 9-tier K8s-native platform

rxp

Reproducible experiments

pulse

Observability & tracing

triage

On-call alerting

terralabs

Cloud infrastructure as code

platform-ctl

Platform control CLI

data-tier

Lakehouse & data platform

mcp-servers

MCP tool server plans

aiops

AI-augmented operations

ops-handbook

Weekly logs + runbooks

Get started with llm-gateway

Clone the repo, read the plan, and start building your own version.

View on GitHub Read the docs

All projects