Repository Pattern

Abstract data access behind a domain-shaped interface. Martin Fowler, PoEAA. The pattern that lets the domain treat collections of entities as if they live in memory.

The domain asks for entities; the repository fetches them. The domain doesn’t know whether they came from Postgres, Redis, an HTTP API, or a fake. Status: STUB — promoted to OUTLINE in Y1 Phase 5.

What this pattern is

The Repository pattern, codified by Martin Fowler in Patterns of Enterprise Application Architecture, abstracts data access behind a domain-shaped interface. Instead of the domain calling db.query("SELECT * FROM users WHERE ..."), it calls userRepository.findByEmail(email). The repository’s interface is expressed in domain terms (entities, aggregates, queries that make sense to the business); its implementation handles the SQL, the ORM, the joins, the caching. The pattern composes naturally with hexagonal architecture — a repository is a driven port.

The domain-shaped interface is what distinguishes a repository from a DAO (Data Access Object). A DAO exposes methods that mirror the persistence layer: getUser(id), insertUser(user), updateUser(user), deleteUser(id). A repository exposes methods that mirror the domain: activeCustomers(), overdueInvoices(), authorsWithMoreThanTenBooks(). The distinction is semantic, not syntactic. A repository whose methods all look like getById and save is a DAO with an ambitious name.

The pattern lets the domain treat entities as if they lived in an infinite in-memory collection. orderRepository.findByCustomer(customer) returns Orders as if they were always available. The repository handles the database round-trip, the pagination, the caching, the query optimization. The domain code stays clean of persistence concerns; the persistence code stays clean of business logic. When the storage layer changes (Postgres → CockroachDB → DynamoDB), only the repository implementation changes.

Concrete instances in the wild

Spring Data repositories. interface UserRepository extends JpaRepository<User, Long> gives you findAll, findById, save, plus derived query methods like findByEmail. Spring generates the implementation; you get a domain-shaped interface for free.
Django ORM QuerySet as repository. User.objects.filter(is_active=True) is a repository-like interface. Django doesn’t call it that, but the shape is the same.
Go with repository interfaces. type UserRepository interface { FindByEmail(email string) (*User, error); ... } with a Postgres implementation, an in-memory test implementation, and a Redis-cached decorator.
In-memory test doubles. NewInMemoryUserRepository() for tests. Same interface, backed by a map[string]*User. Tests run without a database, in single-digit milliseconds.
CQRS read models. Query-side repositories that speak to denormalized projections rather than the write model. Same repository interface, different physical storage.
API-backed repositories. A CustomerRepository whose implementation calls an external HTTP API. The domain doesn’t know it’s an API call; the repository handles the retries, timeouts, and translation.
Multi-source repositories. ProductRepository that reads from Postgres for structured data, Elasticsearch for full-text search, and Redis for hot cache. The domain calls findByCategory(cat, query); the repository composes across three storage systems.
Event-sourced repositories. In event-sourced systems, the repository reconstructs an aggregate by replaying events. Same interface (load(aggregateId)); very different implementation.

Why this pattern matters

Without the repository pattern, data access leaks into domain code. Business methods make SQL calls; controllers embed query filters; caching decisions are scattered across service classes. Every business change requires touching persistence code, and every persistence change requires touching business code. The two concerns never separate.

With the repository pattern, they separate cleanly. Business logic asks for what it needs in domain terms. Data access handles how to get it. Changes on either side don’t cross the boundary. Tests are fast because in-memory repositories don’t touch I/O. Refactors are safe because the domain doesn’t know about the storage layout.

The pattern also enables progressive optimization. A repository can start naive (one query per method, no caching) and evolve into sophisticated (batched queries, layered caching, materialized views) without changing its interface or affecting domain code. That flexibility is what lets a service scale from a hundred users to a million users without a domain-code rewrite.

Over-application is the failure mode. Repositories designed as thin wrappers over ORM calls add ceremony without decoupling. If your repository has one method per SQL query and nothing else, you built a DAO with a repository nameplate. Fix by moving the query-shaping into the repository (findActiveCustomersWithOutstandingInvoices instead of findAll + application-level filtering) and moving the domain-level orchestration out of the repository.

Depth progression

STUB     ← you are here.
OUTLINE  Promoted when Y1 Phase 5 introduces it; Y2 Phase 9 (SQL) deepens with real DB.
DEEP     Promoted when a Y2 service uses real repositories AND you've designed at
         least one repository interface that the domain genuinely depends on (not
         just a thin SQL wrapper).

Preview: what OUTLINE will answer

When Y1 Phase 5 promotes this entry to OUTLINE, it will name:

PROBLEM. How do you keep the domain unaware of persistence details while still making domain-shaped queries efficient?
PRINCIPLES. Interface expressed in domain terms. Collection-like semantics (as if entities lived in memory). Implementation-neutral. Testable with in-memory fakes. Composable with caching, transactions, and multiple backends.
TRADE-OFFS. Rich domain interface (many methods, one per query) vs generic interface (fewer methods, more parameters). Aggregate-scoped repositories (one per aggregate root) vs entity-scoped (one per entity type). Read-side vs write-side repositories in CQRS. ORM-generated repositories vs hand-written.
TOOLS (time-stamped as of 2026-06): Spring Data (Java/Kotlin), Django ORM’s Manager API (Python), Prisma (TypeScript), Ecto (Elixir), Diesel (Rust), and hand-rolled interfaces in Go and other languages without ORM traditions.

The DEEP promotion, after Y2 with real persistence experience, will add MASTERY (operating repositories against Postgres for months), COMPARE (Spring Data repositories vs hand-rolled Go interfaces vs Django’s Manager), OPERATE (repository-related performance incidents like N+1 queries or missing indexes), and CONTRIBUTE (a repository interface refactor documented as an ADR).

Canonical references

Martin Fowler, Patterns of Enterprise Application Architecture (2002) — the canonical text where Repository was codified. Still relevant.
Eric Evans, Domain-Driven Design — chapter on Repositories. Complements Fowler with the DDD angle.
Vaughn Vernon, Implementing Domain-Driven Design — extensive coverage of aggregate-scoped repositories.
Spring Data reference documentation — modern reference implementation of the pattern.
Fowler’s blog posts at martinfowler.com on Repository vs DAO — clarifies the semantic distinction.

Cross-references

Y1 Phase 5: Software Architecture Patterns
Y2 Phase 9: SQL & Relational Databases — real persistence
Related: hexagonal-and-ports-and-adapters, domain-driven-design, clean-architecture
Canonical text: “Patterns of Enterprise Application Architecture” — Martin Fowler