Platform Engineering & Data
Months 25-36. Add observability at depth (kernel-level via eBPF) + the data engineering layer (lakehouse, stream + batch, serving, governance) on top of Year 2’s platform. By year-end, basecamp goes public — and your “Homelab life API” runs end-to-end on it. Exit ramp: Senior DevOps / Data Platform Engineer
The role of Year 3
Year 3 is the inflection point where the platform stops being something you run and starts being something you offer. Year 1 gave you a single-machine intuition; Year 2 made that intuition distributed and multi-cloud. Year 3 is what turns the resulting infrastructure into a product surface — observable from the kernel up, durable at the storage tier, queryable from a notebook, and credible enough to put github.com/abukix/basecamp in someone else’s hands.
The Master Plan calls out three transitions where the role identity changes. Year 3 sits squarely on the second one: platform-as-tool → platform-as-product. Tier 3 (Lakehouse) and Tier 4 (Processing) of basecamp come online, Tier 8 (Data Serving) gets its first occupant, and the JupyterHub-as-a-service entry point makes the platform feel like something a user logs into rather than something you SSH into.
It’s also the year observability graduates from “Prometheus + Grafana, mostly working” to a discipline. eBPF gives you kernel-level events without sidecar instrumentation. OpenTelemetry unifies traces/logs/metrics under a single emission contract. And cardinality discipline — the unglamorous one — is what keeps the bill from tripling every time you add a label.
What you’ll know at the end of Year 3
- Observability at depth — three pillars (metrics, logs, traces) unified under OpenTelemetry; kernel-level events via eBPF without per-service instrumentation; cardinality-as-cost discipline so labels don’t silently 10x storage; SLI/SLO from Y2 reinforced with telemetry that actually maps to user pain.
- Lakehouse architecture — MinIO + Iceberg + Nessie as the open-source 2025 stack; storage/compute separation as a first-class design choice; schema-on-read vs schema-on-write chosen per dataset, not per-team religion; snapshot-plus-delta (Iceberg snapshots) for time travel and rollback.
- Stream + batch processing — Redpanda + Flink for stream, Spark + Airflow + dbt for batch, Lambda vs Kappa chosen as an architectural decision rather than a default. Delivery-semantics reinforced from Y2.
- Data serving — Trino as the federated query layer over heterogeneous sources; Superset as the analytics frontend; query caching reinforced from Y1 fundamentals.
- Data governance + lineage — DataHub or OpenMetadata as the catalog; OpenLineage as the cross-tool lineage contract; column-level masking; audit logs; dbt tests gating production deploys.
- JupyterHub-as-a-service — notebooks as a first-class platform surface, not a one-off install. The entry point for every Studio composition recipe Y4/Y5 will land.
- Operating a public platform — basecamp goes public mid-year. You’ll know what it takes to invite strangers to clone your infrastructure.
You’ll be deployable as a Senior DevOps / Data Platform Engineer or SRE with a data specialty. By end of Year 3 your homelab platform serves the same data engineering shape as production-scale platforms at Netflix, Spotify, Uber — same patterns, smaller scale.
Phase map
| Phase | Title | Approx. weeks | Approx. hours | Pattern depth focus |
|---|---|---|---|---|
| 14 | Observability + eBPF | 8 | 100 | three-pillars-and-unified-telemetry, cardinality-as-cost, runbook-as-code, blameless-postmortem |
| 15 | Lakehouse: MinIO + Iceberg + JupyterHub | 8 | 100 | oltp-vs-olap (DEEP), schema-on-read-vs-write, append-only-log, snapshot-plus-delta |
| 16 | Stream Processing: Redpanda + Flink | 8 | 90 | stream-processing, lambda-and-kappa, delivery-semantics (reinforced) |
| 17 | Batch Processing: Spark + Airflow + dbt | 8 | 90 | batch-processing, materialized-views, idempotency (reinforced) |
| 18 | Data Serving: Trino + Superset | 7 | 80 | caching (reinforced from Y1) |
| 19 | Data Governance (capstone) | 8 | 100 | all Y3 patterns reach DEEP; blameless-postmortem (DEEP) |
| Year 3 Final Exam | 2 | 24 | — | |
| Total | ~49 weeks | ~584 hrs | ~12 patterns deepened |
12 hrs/week × 52 weeks = 624 hrs. Year 3 fits with ~40 hrs slack — by far the most slack of any year, because the data layer has the most “wait, actually” moments and you’ll need it.
What ships during Year 3 (the data tier of basecamp goes operational)
Year 3 is the year the data tier of basecamp comes online. Tier 3 (Lakehouse), Tier 4 (Processing), and Tier 8 (Serving) from the 9-tier stack all land here. By the end of P19, the platform is a real data engineering surface — and it goes public.
| Project | Phase | Launch energy |
|---|---|---|
basecamp | P19 (capstone) | GOES PUBLIC — sanitized via Sealed Secrets, README, blog post, “basecamp at end of Year 3” announcement. The biggest launch of the year. |
personal-api | P17 + P18 | First personal service running on the platform. GitHub commits → Airflow → Iceberg → Trino → REST API. Demonstrates the platform works for you. |
terralabs | continuous | Adds data-infra modules (Redpanda K8s, MinIO operator, Iceberg catalog) |
platform-ctl | continuous | Adds cluster bootstrap + data-pipeline subcommands; still private |
ops-handbook | continuous | Runbooks for every new tier; first eBPF-driven postmortem; cardinality-budget ADR |
personal-api is the year’s narrative payoff. It proves the platform isn’t a portfolio piece — it actually runs your stuff. Cinematic content writes itself: “I logged 5 years of GitHub activity into my own homelab lakehouse and queried it with Trino.”
The basecamp public release is the credibility moment. Up to now, basecamp has been a private repo. At P19 it gets sanitized (Sealed Secrets, no real domains, generic READMEs), tagged, and pushed to github.com/abukix/basecamp. From this point on, anyone can clone the platform and run an equivalent of it. That’s the moat described in the Master Plan — and Year 3 is when it becomes real.
Patterns deepened in Year 3
Roughly ~12 patterns reach DEEP this year — heavy on storage-and-data, observability-and-ops, and stream-vs-batch:
observability-and-ops/three-pillars-and-unified-telemetry(P14)observability-and-ops/cardinality-as-cost(P14)observability-and-ops/runbook-as-code(P14 — formalize the discipline that’s been building since Y1)observability-and-ops/blameless-postmortem(P19 capstone)storage-and-data/oltp-vs-olap(DEEP from Y1’s first OUTLINE)storage-and-data/schema-on-read-vs-write(P15)storage-and-data/append-only-log(P15 — Parquet as concrete example)storage-and-data/snapshot-plus-delta(P15 — Iceberg snapshots)storage-and-data/materialized-views(P17 dbt)storage-and-data/lsm-vs-btree(Y1 OUTLINE → DEEP via real workload comparison)storage-and-data/write-ahead-logging(Y1 OUTLINE → DEEP via Postgres + Iceberg WAL parallels)stream-vs-batch/stream-processing(P16)stream-vs-batch/batch-processing(P17)stream-vs-batch/lambda-and-kappa(P16 → P17 architectural choice)
Each is promoted via the depth ladder described in the Master Plan: STUB → OUTLINE on first phase touch → DEEP after 3+ months of operating something that depends on the pattern. By Y3 end the cumulative DEEP count crosses ~35.
The Studio composition recipe that lands this year
“Homelab life API” — the Y3 composition recipe:
GitHub events → Airflow (P17) → Iceberg lakehouse (P15) ↓ Trino (P18) → REST API → portal command palette (Y5)Every component lands across Y3 phases. By Y3 end, you can ask “how many commits did I make to mlship in March 2027?” and get an answer that ran through your own platform end-to-end.
This is also the first explicit composition recipe documented as a runnable example in basecamp/examples/. The other 4 recipes (RAG, AI-incident-triage, train+register+deploy, AI on-call) land in Y4/Y5 — but they all build on the lakehouse + serving layer this year stands up.
Cloud requirements
Year 3 cloud spend: $0 All work runs on the homelab + cloud accounts you already have from Y2. Year 3 is a "deepen what you have" year, not a "spread to new clouds" year.The exception: if your homelab can’t fit MinIO + Iceberg + Spark + Airflow + Trino simultaneously (it probably can on 64GB at the Month-25 upgrade), you’ll need to either upgrade or temporarily lean on cloud resources for one workload. Plan ahead.
Hardware milestone: Month 25 upgrade
RAM: 32GB → 64GB DDR5 (sell 32GB kit for ~$40)Cost: ~$150-200 netWhy: Year 3 stack is RAM-hungry. MinIO + Postgres + Spark + Airflow + Trino + Loki + Prometheus + dbt + JupyterHub does not fit in 32GB.Schedule the upgrade for week 1 of Phase 14 — observability tooling alone wants ~12GB. Full SKU breakdown and sourcing notes: homelab/hardware.
Reading order
- This index — orient on the year’s arc and what crosses the public-release line.
phase-14— observability is the foundation everything else stands on. eBPF + OTel + cardinality discipline before anything else gets added to the platform.phase-15— the lakehouse is the data-layer foundation. JupyterHub lands here too, since notebooks are how you’ll actually exercise the lakehouse.phase-16,phase-17,phase-18in order — stream, then batch, then serving. They each consume the lakehouse from P15.phase-19— capstone + basecamp public release. Governance is what makes the public release responsible rather than reckless.final-exam~2 weeks before end of P19.
DDIA Ch. 10 (Batch Processing) and Ch. 11 (Stream Processing) are Year 3’s reading spine. Pace 1 chapter / 4 weeks alongside the phase work.
Year 3 graduation
You can:- Design + operate observability for a multi-cluster platform (incl. eBPF kernel-level events)- Architect + run a lakehouse (MinIO + Iceberg)- Build streaming pipelines (Redpanda + Flink) with exactly-once-ish semantics- Build batch pipelines (Spark + Airflow + dbt) with backfill discipline- Federate queries across data sources (Trino) + serve to analysts (Superset)- Govern data (catalog, lineage, access control, quality tests)- Run notebooks-as-a-service (JupyterHub on the platform)- Operate a public OSS platform (basecamp now public)
Exit ramp: Senior DevOps / Data Platform Engineer / Site Reliability EngineerConfidence: ~35 patterns DEEP, multi-cloud platform operational, basecamp public, personal-api running on your own platform→ program/year-4/index.md — ML & AI Infrastructure builds Tiers 5/6/7 on top of the data tier you just stood up.