Skip to content
5-YEAR PROGRAM · YEAR 2 · PHASE 13
UPCOMING

Multi-cloud basecamp (Year 2 Capstone)

Two months. The synthesis phase. Take everything from Year 2 — distributed systems, IaC, AWS, GCP, Backstage, mesh, security, SLOs — and turn it into a coherent multi-cloud platform you’d be comfortable handing to another engineer. ~8 weeks, ~90 hrs.


Phase 13 is the pattern-first scaffold at year scale. Phase 8 gave you theory; Phase 9 gave you IaC; Phase 10 and Phase 11 gave you two clouds; Phase 12 gave you a developer-facing platform with a security baseline. Phase 13 asks one question: do the pieces actually compose into something a second engineer could pick up and operate?

The temptation is to reach for one more tool. Don’t. Phase 13 is integration, not addition. The deliverable is not a new component; it’s the proof that the components you already have add up to a platform — with multi-cloud DR drills, cost dashboards, onboarding runbooks, postmortems, and platform SLOs that hold during the drills. Every checklist item in section 5.1 is something the components already enable; the work is making them coherent.

By phase end, every Year 2 pattern reaches DEEP in the pattern-depth ladder, basecamp is public-ready (sanitized for the Year 3 public launch), platform-ctl keeps growing privately, and terralabs covers all three providers with TF + Crossplane parity. That’s the substrate Year 3-5 build on. The Year 2 Final Exam is the readiness check.


Prerequisites

  • Phase 12 complete — Backstage, mesh, security, platform-ctl all working
  • basecamp managing K3s + EKS + GKE via ArgoCD
  • terralabs covering all 3 providers
  • You accept: this is integration, not addition. No new toys. You’re proving the components add up to a platform.

Why this phase exists

Year 2’s exit ramp is DevOps / Cloud Engineer / Platform Engineer. By Phase 12 you have all the components. Phase 13 is where you turn the components into a platform you can demo + document + hand off.

The Year 2 Final Exam tests this synthesis.


1. PROBLEM

Components ≠ platform. A platform requires:

  • Coherent UX across clouds (basecamp + Backstage + platform-ctl hide cloud differences).
  • Disaster recovery (what if one cloud goes down?).
  • Multi-tenant isolation (namespaces + RBAC + NetworkPolicy + ResourceQuota).
  • Cost observability (per-team, per-service, per-cloud).
  • Operational maturity (alerting, runbooks, SLOs already in place from Phase 12).

This phase tests all of them with real drills. The point of a drill isn’t to pass; the point is to find the gap before a real incident does.


2. PRINCIPLES (no new: exercise the ones you have)

The phase is about exercising Year 2’s pattern set, not deepening more:

If any are still STUB/OUTLINE: deepen to DEEP this phase. By Phase 13 end, every Year 2 pattern is DEEP.


3. THE BIG TRADE-OFF: how much multi-cloud is right?

OptionWhen it’s rightCost
Single cloud (all AWS)Most companies; optimize for cost + simplicityVendor lock-in; no DR for cloud-wide outages
Active-active multi-cloudReal regulatory or DR requirementHigh operational + data-sync cost
Multi-cloud for skills/portabilityLearning; hedge against lock-inReal cost; consider passive-only
Multi-cloud per-service”Best cloud per workload”Highest ops complexity, often the wrong answer

basecamp is multi-cloud for learning + portability. Document why; accept the cost is taken willingly. The Year 2 capstone deliberately pays this multi-cloud tax so the patterns transfer; in a production org you’d usually pick option 1 and be honest about it.


4. TOOLS

No new tools. Phase 13 is integration of what you have.


5. MASTERY: build the platform

5.1 Operational depth checklist (the synthesis)

[ ] basecamp ArgoCD manages: K3s (homelab) + EKS (AWS) + GKE (GCP) simultaneously
[ ] One service deployed to all 3 clusters via single Application + ApplicationSet
[ ] Backstage catalog shows services across all 3 clusters with health
[ ] terralabs provisions identical VPC + cluster + DB shape on AWS + GCP from same module shape
[ ] Cost dashboard: aggregated per cluster / service / team — visible in Backstage
[ ] DR drill: simulate EKS cluster failure; Application reroutes to GKE
[ ] Cost emergency drill: AWS bill spiked; identify cause + remediate in <30 min
[ ] Onboarding drill: clone basecamp, follow README, get a working dev env in <2 hours
[ ] Platform SLO holds across drills: "basecamp Applications reach Synced within 5 min of git commit, 99% of the time"
[ ] Postmortem written for one self-inflicted incident this phase

5.2 Documentation overhaul

Phase 13 doubles as a documentation phase. By end:

  • basecamp/README.md — what is basecamp, how to bootstrap, how to add a service.
  • terralabs/README.md — module index, examples per provider.
  • ops-handbook/runbooks/platform/ — at least 10 runbooks covering common platform ops.
  • projects/basecamp/PLAN.md updated with current state + roadmap.
  • One blog post on abukix.dev/blog: “basecamp at end of Year 2 — what I learned.”

The hand-off-ability check: a friend who knows K8s but not your platform can clone the repo, follow the README, and get a working dev env in <2 hours. If they can’t, the docs are wrong, not the friend.


6. COMPARE: this homelab platform vs the Backstage-style portal at scale

Re-read your program/platform-patterns.md (added later this year — even if just stubs) with fresh eyes. For each pattern in the mapping table, score: “have I now implemented this in basecamp?”

This is your readiness check for Year 3 (when the data layer enters).


7. OPERATE

  • basecamp + platform-ctl are now substantial; basecamp goes public next year (Year 3) — keep private through Year 2.
  • 5+ new runbooks (multi-cloud DR, cost emergency, basecamp bootstrap, ArgoCD recovery, cross-cluster traffic).
  • 2+ postmortems.
  • Weekly log.

8. CONTRIBUTE

Year 2 PR deadline: must be shipped by end of Phase 13. Update ops-handbook/contributions/contribution-plan.md.


Validation criteria (= Year 2 Final Exam readiness)

[ ] All 10 operational depth checks
[ ] basecamp public-ready (will go public Year 3 with sanitized secrets)
[ ] platform-ctl growing (still private)
[ ] Multi-cloud DR drill passed (one cluster down, traffic shifts)
[ ] Cost dashboard live; budget alerts working
[ ] All Year 2 patterns DEEP:
- replication, consensus, partitioning, eventual-consistency, cap-and-pacelc, idempotency
- delivery-semantics, two-phase-commit-vs-sagas, crdts, distributed-time
- declarative-vs-imperative-infrastructure, gitops, immutable-infrastructure, multi-tenancy, platform-as-product
- service-mesh, secrets-lifecycle, defense-in-depth, least-privilege
- zero-trust-networking, zero-trust-security, sli-slo-error-budget
[ ] Year 2 Final Exam passed

Anti-patterns

Anti-patternWhy
”Just one more tool”Phase 13 is integration, not addition
Documentation deferredWhole point is hand-off-ability; doc as you build
Skipping the DR drillFirst real outage is a terrible time to learn DR
Multi-cloud without articulated justificationIf you can’t explain why, you’re paying ops tax for nothing

Reading list

RequiredWhy
Re-read DDIA Ch. 5-9Year 2’s theoretical foundation should now click harder
Google SRE Book Ch. on incident responseDR drill rigor

Year 2 graduation

You can:
- Design and operate multi-cloud K8s platforms
- Reason about distributed-systems trade-offs from theory + practice
- Build internal developer platforms (Backstage, mesh, security)
- Manage cost across clouds, defend security at depth, recover from DR
- Ship OSS that other engineers find useful (terralabs)
- Define + measure platform SLOs
Exit ramp: DevOps Engineer / Senior DevOps / Cloud Engineer / Platform Engineer
Confidence: real, demonstrable, has shipped artifacts

Year 2 Final Exam, then Year 3: Platform Engineering & Data.