triage Plan
An on-call app running on K3s. Lists open incidents, who’s paged, next escalation time. Uses Postgres (Phase 3 schema) + Redis (active-paging state) + Prometheus (metrics). First real service-on-K3s. Lives in
basecamp/charts/triage/.
triage is the Group B service that closes Year 1. It’s the first real workload deployed onto basecamp’s K3s cluster at the end of Phase 7 (Kubernetes + GitOps), and it’s deliberately scoped to exercise every Year 1 phase: the Phase 3 incidents schema, Phase 4 Python migration scripts, Phase 5 Go backend, Phase 6 container build, and Phase 7 K3s deployment + GitOps reconciliation.
Beyond being a phase deliverable, triage is real software with a real role. It’s better than the post-it-note incident tracking most homelabs run with, and it earns a permanent place in the platform: by Year 5 Phase 28, services/aiops/ queries triage’s open-incidents API as a tool via basecamp-mcp, making triage the data source for the auto-incident triage composition recipe (Recipe 2).
The architecture is intentionally boring: Go + chi + sqlx + slog backend, htmx server-rendered HTML frontend (no SPA — Year 1 isn’t where you learn frontend), Postgres for incidents, Redis for active-paging state, Prometheus for metrics. The interesting part isn’t the stack; it’s that this is the first service that proves the integration — that all those Year 1 phases compose into a thing that runs in production.
What it is
A small but real web service:
Backend: Go (chi router, sqlx Postgres, slog)Frontend: server-rendered HTML + htmx (no SPA — not learning frontend yet)Persistence: Postgres (incidents schema from Phase 3) + Redis (active paging)Deploy: Helm chart in basecamp; ArgoCD-managedObservability: Prometheus metrics; structured logs to Loki (from Y3)Endpoints:
GET /: open-incidents dashboardGET /incidents/{id}: incident detail + timelinePOST /incidents: create new incident (idempotent via key)POST /incidents/{id}/escalate: escalateGET /healthz,GET /metrics,GET /readyz
Why it exists
- Phase deliverable: Year 1 Phase 7 K8s; first real service-on-K3s
- Ties together Year 1. Uses: Phase 3 schema + Phase 4 (Python migration scripts) + Phase 5 (Go backend) + Phase 6 (container) + Phase 7 (K3s deployment + GitOps)
- Real value. Better than the post-it-note incident tracking most homelabs have.
- Year-5 integration:
services/aiops/queries triage’s open-incidents API as a tool via basecamp-mcp.
Pattern it teaches
The integration pattern: first real service that exercises:
- state-vs-computation (stateless container + Postgres/Redis backend)
- gitops (ArgoCD-deployed)
- defense-in-depth (NetworkPolicy + RBAC + Pod Security)
Scope
v1 (Year 1 Phase 7)
[ ] Go backend with chi + sqlx + slog[ ] htmx frontend, server-rendered HTML[ ] Postgres incidents schema (from Phase 3)[ ] Redis for active-paging state[ ] Helm chart in basecamp/charts/triage/[ ] ArgoCD Application in basecamp/applications/tier-1-foundation/[ ] Prometheus metrics + Grafana dashboard[ ] Structured logs (slog) shipped to Loki (Phase 14, deferred slightly from initial release)[ ] >70% test coverage[ ] CI: GitHub Actions builds + pushes image; ArgoCD syncs on tag[ ] README + architecture diagramY3 + (incremental enhancements)
- Loki log shipping (Y3 P14)
- OTel traces (Y3 P14)
- SLO definition + burn-rate alerts (already from Y2 P12 discipline)
Y5 (integration with AIOps)
- Expose
/v1/incidents/openAPI forservices/aiops/to consume via basecamp-mcp - Surface in Studio Portal
- AI-executable runbook: “auto-escalate if no acknowledgment in N minutes”
When built
Year 1 Phase 7, Month 10-12. Ships to github.com/abukix/triage.
Dependencies
triage requires basecamp Tier 1 to be live (Postgres, Redis, ArgoCD, Prometheus, Grafana — all from Phase 7). It also leans on the Phase 3 incidents schema (database design), Phase 4 Python (migration scripts), and Phase 5 Go (the backend itself). Y5 integration depends on services/aiops/ and basecamp-mcp shipping in Year 5 Phases 27-28.
Deliverables checklist
[ ] github.com/abukix/triage public (quiet ship)[ ] Helm chart in basecamp/charts/triage/[ ] ArgoCD Application in basecamp/applications/triage.yaml[ ] Deployed on homelab K3s[ ] Real incident logged via triage during Phase 7 exit test[ ] README + architecture diagramPublic vs private
Public from Y1 P7 ship: quiet ship. Push to GitHub + tag a release. No blog post, no LinkedIn announcement. Year 1 ships are about discipline (publish + PR review + cut release), not launch energy. Loud launches reserved for terralabs (Y2), basecamp (Y3), Abukix Studio + mlship v2 (Y5).
Cross-references
- Phase: Year 1 Phase 7
- Pattern: state-vs-computation
- Schema: from Phase 3 — Databases
- Master plan context: Master Plan — Group B services
- Year 1 context: Year 1 — what ships publicly
- Brand context: Abukix Studio
- Related: basecamp (where it’s deployed)
- Y5 integration: services/aiops/ consumes triage’s API; surfaces in composition Recipe 2