Multi-cloud basecamp (Year 2 Capstone)
Two months. The synthesis phase. Take everything from Year 2 — distributed systems, IaC, AWS, GCP, Backstage, mesh, security, SLOs — and turn it into a coherent multi-cloud platform you’d be comfortable handing to another engineer. ~8 weeks, ~90 hrs.
Phase 13 is the pattern-first scaffold at year scale. Phase 8 gave you theory; Phase 9 gave you IaC; Phase 10 and Phase 11 gave you two clouds; Phase 12 gave you a developer-facing platform with a security baseline. Phase 13 asks one question: do the pieces actually compose into something a second engineer could pick up and operate?
The temptation is to reach for one more tool. Don’t. Phase 13 is integration, not addition. The deliverable is not a new component; it’s the proof that the components you already have add up to a platform — with multi-cloud DR drills, cost dashboards, onboarding runbooks, postmortems, and platform SLOs that hold during the drills. Every checklist item in section 5.1 is something the components already enable; the work is making them coherent.
By phase end, every Year 2 pattern reaches DEEP in the pattern-depth ladder, basecamp is public-ready (sanitized for the Year 3 public launch), platform-ctl keeps growing privately, and terralabs covers all three providers with TF + Crossplane parity. That’s the substrate Year 3-5 build on. The Year 2 Final Exam is the readiness check.
Prerequisites
Why this phase exists
Year 2’s exit ramp is DevOps / Cloud Engineer / Platform Engineer. By Phase 12 you have all the components. Phase 13 is where you turn the components into a platform you can demo + document + hand off.
The Year 2 Final Exam tests this synthesis.
1. PROBLEM
Components ≠ platform. A platform requires:
- Coherent UX across clouds (basecamp + Backstage + platform-ctl hide cloud differences).
- Disaster recovery (what if one cloud goes down?).
- Multi-tenant isolation (namespaces + RBAC + NetworkPolicy + ResourceQuota).
- Cost observability (per-team, per-service, per-cloud).
- Operational maturity (alerting, runbooks, SLOs already in place from Phase 12).
This phase tests all of them with real drills. The point of a drill isn’t to pass; the point is to find the gap before a real incident does.
2. PRINCIPLES (no new: exercise the ones you have)
The phase is about exercising Year 2’s pattern set, not deepening more:
- control-loops — ArgoCD + Crossplane reconciling at multi-cloud scale.
- declarative-vs-imperative-infrastructure — all 3 clouds via terralabs.
- multi-tenancy — namespaces + RBAC + quotas at depth.
- gitops — basecamp is the source of truth.
- defense-in-depth — image signing + Pod Security + NetPol + mTLS + RBAC.
- zero-trust-networking — mesh + mTLS + ACLs across clusters.
- sli-slo-error-budget — platform SLOs hold during drills.
If any are still STUB/OUTLINE: deepen to DEEP this phase. By Phase 13 end, every Year 2 pattern is DEEP.
3. THE BIG TRADE-OFF: how much multi-cloud is right?
| Option | When it’s right | Cost |
|---|---|---|
| Single cloud (all AWS) | Most companies; optimize for cost + simplicity | Vendor lock-in; no DR for cloud-wide outages |
| Active-active multi-cloud | Real regulatory or DR requirement | High operational + data-sync cost |
| Multi-cloud for skills/portability | Learning; hedge against lock-in | Real cost; consider passive-only |
| Multi-cloud per-service | ”Best cloud per workload” | Highest ops complexity, often the wrong answer |
basecamp is multi-cloud for learning + portability. Document why; accept the cost is taken willingly. The Year 2 capstone deliberately pays this multi-cloud tax so the patterns transfer; in a production org you’d usually pick option 1 and be honest about it.
4. TOOLS
No new tools. Phase 13 is integration of what you have.
5. MASTERY: build the platform
5.1 Operational depth checklist (the synthesis)
[ ] basecamp ArgoCD manages: K3s (homelab) + EKS (AWS) + GKE (GCP) simultaneously[ ] One service deployed to all 3 clusters via single Application + ApplicationSet[ ] Backstage catalog shows services across all 3 clusters with health[ ] terralabs provisions identical VPC + cluster + DB shape on AWS + GCP from same module shape[ ] Cost dashboard: aggregated per cluster / service / team — visible in Backstage[ ] DR drill: simulate EKS cluster failure; Application reroutes to GKE[ ] Cost emergency drill: AWS bill spiked; identify cause + remediate in <30 min[ ] Onboarding drill: clone basecamp, follow README, get a working dev env in <2 hours[ ] Platform SLO holds across drills: "basecamp Applications reach Synced within 5 min of git commit, 99% of the time"[ ] Postmortem written for one self-inflicted incident this phase5.2 Documentation overhaul
Phase 13 doubles as a documentation phase. By end:
basecamp/README.md— what is basecamp, how to bootstrap, how to add a service.terralabs/README.md— module index, examples per provider.ops-handbook/runbooks/platform/— at least 10 runbooks covering common platform ops.projects/basecamp/PLAN.mdupdated with current state + roadmap.- One blog post on
abukix.dev/blog: “basecamp at end of Year 2 — what I learned.”
The hand-off-ability check: a friend who knows K8s but not your platform can clone the repo, follow the README, and get a working dev env in <2 hours. If they can’t, the docs are wrong, not the friend.
6. COMPARE: this homelab platform vs the Backstage-style portal at scale
Re-read your program/platform-patterns.md (added later this year — even if just stubs) with fresh eyes. For each pattern in the mapping table, score: “have I now implemented this in basecamp?”
This is your readiness check for Year 3 (when the data layer enters).
7. OPERATE
- basecamp + platform-ctl are now substantial; basecamp goes public next year (Year 3) — keep private through Year 2.
- 5+ new runbooks (multi-cloud DR, cost emergency, basecamp bootstrap, ArgoCD recovery, cross-cluster traffic).
- 2+ postmortems.
- Weekly log.
8. CONTRIBUTE
Year 2 PR deadline: must be shipped by end of Phase 13. Update ops-handbook/contributions/contribution-plan.md.
Validation criteria (= Year 2 Final Exam readiness)
[ ] All 10 operational depth checks[ ] basecamp public-ready (will go public Year 3 with sanitized secrets)[ ] platform-ctl growing (still private)[ ] Multi-cloud DR drill passed (one cluster down, traffic shifts)[ ] Cost dashboard live; budget alerts working[ ] All Year 2 patterns DEEP: - replication, consensus, partitioning, eventual-consistency, cap-and-pacelc, idempotency - delivery-semantics, two-phase-commit-vs-sagas, crdts, distributed-time - declarative-vs-imperative-infrastructure, gitops, immutable-infrastructure, multi-tenancy, platform-as-product - service-mesh, secrets-lifecycle, defense-in-depth, least-privilege - zero-trust-networking, zero-trust-security, sli-slo-error-budget[ ] Year 2 Final Exam passedAnti-patterns
| Anti-pattern | Why |
|---|---|
| ”Just one more tool” | Phase 13 is integration, not addition |
| Documentation deferred | Whole point is hand-off-ability; doc as you build |
| Skipping the DR drill | First real outage is a terrible time to learn DR |
| Multi-cloud without articulated justification | If you can’t explain why, you’re paying ops tax for nothing |
Reading list
| Required | Why |
|---|---|
| Re-read DDIA Ch. 5-9 | Year 2’s theoretical foundation should now click harder |
| Google SRE Book Ch. on incident response | DR drill rigor |
Year 2 graduation
You can:- Design and operate multi-cloud K8s platforms- Reason about distributed-systems trade-offs from theory + practice- Build internal developer platforms (Backstage, mesh, security)- Manage cost across clouds, defend security at depth, recover from DR- Ship OSS that other engineers find useful (terralabs)- Define + measure platform SLOs
Exit ramp: DevOps Engineer / Senior DevOps / Cloud Engineer / Platform EngineerConfidence: real, demonstrable, has shipped artifacts→ Year 2 Final Exam, then Year 3: Platform Engineering & Data.