Skip to content
STUB

Model Lifecycle

The pattern: train → evaluate → register → deploy → monitor → retrain. Each hand-off needs a contract. Reproducibility (data version + code version + params + seeds). Gated promotion (eval threshold). Versioned serving (canary). Drift monitoring (compare prod vs. train distributions). Automatic retrain trigger when drift crosses threshold.

The trade-off: MLOps overhead vs. silent model rot. ML systems without lifecycle discipline degrade quietly — features drift, the world changes, accuracy drops, no one notices until a quarterly review. The discipline is real overhead (MLflow, KServe, drift, retrain pipeline) — but the alternative is “we deployed this model 18 months ago and forgot about it.” That’s career-limiting at Staff/Principal level.

Deepens in Year 4 Phase 20: MLOps Foundations (MLflow as the registry frame everything hangs off) and reaches DEEP in Phase 25: GPU Infrastructure when drift detection auto-rolls back a regressed model. Phase 21: ML Serving + mlship v0 is where canary serving first lands.

  • train-serve-skew — the failure that makes “monitor” a non-optional step.
  • feature-store — provides the reproducible feature view at every promotion.
  • inference-shapes — each shape has its own canary + rollback story.
  • rag-as-pattern — applies the same lifecycle to embeddings, indexes, and prompts.
  • prompt-as-program — same versioning shape, applied to prompts.
  • snapshot-plus-delta — versioned data is half of reproducibility.
  • mlship — capstone CLI that turns this lifecycle into one command.
  • basecamp — MLflow + KServe + Kubeflow live in Tier 5/6.