Model Registry

Versioned store of ML models with stage/alias-based promotion, lineage, and CI-driven promotion gates. MLflow, SageMaker Model Registry, Vertex AI Model Registry.

Every model in production is versioned. Promotion is a deliberate event. Rollback is one alias swap. Lineage is queryable. Status: STUB — promoted to OUTLINE in Y5 Phase 39.

What this pattern is

A model registry is the versioned source-of-truth for ML model artifacts plus their metadata. Every model in production has a registry entry. Every promotion (None → Staging → Production) is a deliberate, audited event. Every promoted model has queryable lineage — which Git SHA, which dataset snapshot, which hyperparameters, which eval results. Modern MLflow uses aliases (@champion, @challenger) instead of legacy stages — multiple aliases can point at different versions, supporting A/B and canary patterns. Downstream consumers (KServe InferenceService, batch jobs, fine-tune jobs) reference models by registry://name@Production so the alias swap is how rollouts happen.

The pattern composes with experiment-tracking (the registry consumes tracked runs), with Iceberg time-travel (for dataset lineage), and with CI (promotion gates enforce eval thresholds before alias swap). Without a registry, “what’s deployed?” becomes a debugging mystery, and “reproduce v1.4.2” becomes impossible.

The pattern is not new — traditional software has package registries (npm, PyPI, Maven), image registries (Docker Hub, GHCR), and artifact repositories (Artifactory, Nexus). Model registries adapt the same idea to ML-specific needs: model artifacts are larger (hundreds of MB to hundreds of GB), have richer metadata (training data hash, eval scores, framework versions), and need lineage tracking that ordinary package registries don’t provide. The core discipline — version everything that ships to production — is the same.

The registry is what makes ML deployment operationally similar to service deployment. Deploy = alias swap. Rollback = swap back. Canary = split traffic between aliases. Audit = query the registry. Without it, ML deployment is bespoke per-model with no shared discipline, and every model owner reinvents deployment infrastructure. With it, the platform provides one deployment abstraction that works across models, and MLops looks more like DevOps.

Concrete instances in the wild

  • MLflow Model Registry. OSS, K8s-native, the reference implementation. basecamp default.
  • AWS SageMaker Model Registry. AWS-managed, integrated with SageMaker training + inference.
  • GCP Vertex AI Model Registry. GCP-managed equivalent.
  • Weights & Biases Artifacts. Commercial registry with rich lineage tracking.
  • Neptune Model Registry. Commercial alternative focused on team collaboration.
  • BentoML. OSS model serving with registry semantics baked in.
  • DVC (Data Version Control). Git-based versioning for models and data. Simpler than a registry but effective for small teams.
  • Hugging Face Hub. The de facto public model registry for open models. Also usable as a private registry.
  • Databricks Model Registry. Databricks’ MLflow integration with additional governance features.
  • basecamp registry (Y5 Phase 39). MLflow deployed via K8s operator, backed by Postgres + object storage.

Why this pattern matters

Without a registry, “what model is running in production?” is a debugging mystery. The model is a binary blob deployed via ad-hoc scripts. Its lineage lives in someone’s Slack messages, or on a whiteboard, or nowhere. When it produces a bad prediction, tracing back to the training run that produced it is impossible. When it needs rollback, finding the previous version requires archaeology.

With a registry, each of these is trivial. The running model is payment-fraud@production, which points at version 1.4.2, which was produced by run_id abc123, which used dataset snapshot xyz789, with hyperparameters visible in the registry UI. Rollback is mlflow set-alias --name payment-fraud --alias production --version 1.4.1. Audit is a registry query.

The pattern also enables ML-specific governance patterns. Promotion gates enforce eval thresholds — a model can’t be aliased to production if its eval score is below the previous version’s. Approvals workflow requires human review before promotion. Automatic rollback triggers on production degradation (evals in production show quality below threshold). Each of these is standard DevOps discipline applied to ML.

For LLM-era workflows, the registry also tracks prompt templates, RAG configurations, and system prompts as versioned artifacts alongside model weights. The “prompt engineering” that produces a specific model behavior needs the same versioning discipline as the weights themselves. Modern registries handle this uniformly.

Modern platforms make model registries operationally accessible. MLflow deploys as a K8s app with Postgres backend and object storage. SageMaker and Vertex are managed services with SDK integration. Hugging Face Hub provides both public and private registry with git-based collaboration. What used to be internal-tooling investment is now a checkbox.

The failure modes to know: registries become stale if promotion isn’t enforced (models get deployed without going through the registry); registries become bloated without expiration policies (thousands of unused versions accumulate); registries become critical infrastructure whose downtime blocks deploys (needs its own SRE attention). Each has known patterns for prevention.

Depth progression

STUB     ← you are here.
OUTLINE  Promoted when Y5 Phase 39 deploys MLflow registry on basecamp.
DEEP     Promoted after Y5 Phase 39 — 5+ models registered, promotion CI working,
         at least one rollback rehearsed.

Preview: what OUTLINE will answer

When Y5 Phase 39 promotes this entry to OUTLINE, it will name:

  • PROBLEM. How do you version, promote, and roll back ML models operationally like any other production artifact?
  • PRINCIPLES. Every model has a registry entry. Promotion is deliberate and audited. Aliases enable safe rollout patterns. Lineage is queryable. Consumers reference by alias, not version. Rollback is an alias swap.
  • TRADE-OFFS. OSS self-hosted (MLflow, BentoML — flexible, ops burden) vs managed (SageMaker, Vertex — easy, lock-in). Legacy stages (Staging/Production) vs modern aliases (@champion/@challenger). Simple registry (DVC, Git-based) vs full-featured (MLflow with signatures, model schemas).
  • TOOLS (time-stamped as of 2026-06): MLflow (basecamp default), AWS SageMaker Model Registry, GCP Vertex AI Model Registry, Weights & Biases, Neptune, BentoML, DVC, Hugging Face Hub, Databricks Model Registry.

The DEEP promotion, after Y5 Phase 39 with 5+ models registered and CI promotion working, will add MASTERY (operating MLflow on basecamp with real models), COMPARE (MLflow vs SageMaker vs BentoML), OPERATE (a specific rollback event), and CONTRIBUTE (an MLflow documentation improvement or plugin).

Canonical references

  • MLflow documentation. Free at mlflow.org.
  • Databricks blog posts on the aliases-vs-stages transition. Free.
  • Chip Huyen, Designing Machine Learning Systems (O’Reilly, 2022) — chapter on model deployment.
  • SageMaker Model Registry documentation. Free at aws.amazon.com/sagemaker.
  • Vertex AI Model Registry documentation. Free at cloud.google.com/vertex-ai.

Cross-references