The test pyramid -- unit tests at the base, integration in the middle, end-to-end at the top -- served us well for a decade. But modern architectures (microservices, event-driven systems, serverless) have exposed its limitations. New models like the testing diamond, testing trophy, and contract testing have emerged to address the gaps. This post maps the landscape, compares strategies, and provides practical guidance for building a testing portfolio that matches your architecture.
Test Type Taxonomy
Test Type Taxonomy
│
├── Unit Tests
│ ├── Pure function tests (no dependencies)
│ ├── Component tests (single module with mocks)
│ └── Snapshot tests (UI component output)
│
├── Integration Tests
│ ├── Service integration (API + database)
│ ├── Contract tests (consumer-driven or provider)
│ └── Component integration (module + real dependencies)
│
├── End-to-End Tests
│ ├── UI-driven (Playwright, Cypress)
│ ├── API-driven (full request flow)
│ └── Synthetic monitoring (production probes)
│
├── Specialized Tests
│ ├── Property-based (generative inputs)
│ ├── Mutation testing (code modification)
│ ├── Chaos testing (failure injection)
│ ├── Performance / load testing
│ ├── Security testing (DAST, SAST)
│ └── Visual regression testing
│
└── Production Testing
├── Canary deployments
├── Feature flags + A/B tests
├── Synthetic monitoring
└── Observability-driven testing
Strategy Comparison: Pyramid vs Diamond vs Trophy
Test Pyramid Testing Diamond Testing Trophy
/\ /\ /\
/E2E\ /E2E\ /E2E \
/------\ /------\ /--------\
/ Integr \ / Integr \ / Integr \
/----------\ /----------\ / (largest) \
/ Unit \ / Unit \ / Static \
(most tests) \ (fewer than \ /--------------\
/________________\ integration) \ / Unit \
/________________\ /________________\
Best for: Best for: Best for:
Monoliths, Microservices, Frontend apps,
libraries, API-heavy JavaScript/TS
well-isolated systems ecosystems
code
| Aspect | Pyramid | Diamond | Trophy |
|---|
| Emphasis | Unit tests dominate | Integration tests dominate | Integration + static analysis |
| Unit test ratio | 70% | 20-30% | 20-30% |
| Integration ratio | 20% | 50-60% | 40-50% |
| E2E ratio | 10% | 10-20% | 10% |
| Static analysis | Not prioritized | Not prioritized | Foundation layer |
| Speed | Fastest overall | Moderate | Moderate |
| Confidence | High for isolated logic | High for system behavior | High for user-facing behavior |
| Maintenance cost | Low (units), high (E2E) | Moderate | Moderate |
| Best architecture | Monolith, libraries | Microservices, APIs | Frontend, full-stack JS |
Tool Landscape by Test Type
| Test Type | JavaScript/TS | Python | Go | Java | Cross-Platform |
|---|
| Unit | Vitest, Jest | pytest | go test | JUnit 5 | -- |
| Integration | Supertest, Testcontainers | pytest + httpx, Testcontainers | testcontainers-go | Testcontainers, Spring Test | Testcontainers |
| E2E (UI) | Playwright, Cypress | Playwright | -- | Selenium | Playwright |
| E2E (API) | Supertest, REST Client | httpx, requests | net/http | REST Assured | Postman, k6 |
| Contract | Pact JS | Pact Python | Pact Go | Pact JVM, Spring Cloud Contract | Pact |
| Property-based | fast-check | Hypothesis | rapid | jqwik | -- |
| Mutation | Stryker | mutmut, cosmic-ray | -- | PITest | -- |
| Performance | k6, Artillery | Locust | vegeta, hey | Gatling, JMeter | k6 |
| Visual regression | Chromatic, Percy | -- | -- | -- | Chromatic, Percy |
| Static analysis | ESLint, TypeScript | mypy, ruff, pylint | go vet, staticcheck | SpotBugs, ErrorProne | SonarQube |
CI/CD Integration Architecture
┌─────────────────────────────────────────────────────────────┐
│ DEVELOPER PUSH │
└────────────────────────┬────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 1: PRE-COMMIT (< 30 seconds) │
│ ├── Linting (ESLint, ruff) │
│ ├── Formatting (Prettier, black) │
│ ├── Type checking (TypeScript, mypy) │
│ └── Affected unit tests only │
└────────────────────────┬────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 2: CI PIPELINE (< 10 minutes) │
│ ├── Full unit test suite │
│ ├── Integration tests (Testcontainers) │
│ ├── Contract tests (Pact broker) │
│ ├── Security scanning (SAST) │
│ └── Build artifacts │
└────────────────────────┬────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 3: STAGING DEPLOY (< 30 minutes) │
│ ├── Deploy to staging environment │
│ ├── E2E tests (Playwright against staging) │
│ ├── Performance baseline tests │
│ ├── Visual regression tests │
│ └── Security scanning (DAST) │
└────────────────────────┬────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 4: PRODUCTION (continuous) │
│ ├── Canary deployment (5% → 25% → 100%) │
│ ├── Synthetic monitoring │
│ ├── Error rate monitoring │
│ └── Automatic rollback on SLO violation │
└─────────────────────────────────────────────────────────────┘
Cost of Defect by Stage
| Stage Where Defect Found | Relative Cost | Example | Detection Method |
|---|
| Design / planning | 1x | Misunderstood requirement | Design review, spec review |
| Coding (IDE) | 1.5x | Type error, null reference | Static analysis, type system |
| Pre-commit | 2x | Logic error in function | Unit test, linting |
| CI pipeline | 5x | Integration contract broken | Integration test, contract test |
| Staging / QA | 15x | User flow broken | E2E test, manual QA |
| Production (detected fast) | 50x | Data corruption, outage | Monitoring, synthetic tests |
| Production (detected late) | 100-1000x | Silent data corruption, compliance breach | Audit, customer report |
Contract Testing Deep Dive
| Aspect | Consumer-Driven (Pact) | Provider-Driven | Schema-Based (OpenAPI) |
|---|
| Who defines contract? | Consumer | Provider | Shared specification |
| Direction of verification | Consumer generates, provider verifies | Provider publishes, consumers adapt | Both validate against spec |
| Best for | Microservices with known consumers | Public APIs | API-first development |
| Drift detection | Excellent | Good | Good (if enforced) |
| Tooling | Pact, PactFlow | Spring Cloud Contract | Spectral, Optic, Prism |
| CI integration | Pact Broker in pipeline | Provider pipeline | Spec validation in PR |
Property-Based Testing: When and Why
| Scenario | Example Property | Traditional Test Weakness |
|---|
| Serialization roundtrip | decode(encode(x)) == x | Only tests chosen examples |
| Sorting | output is sorted AND same length as input | Misses edge cases (empty, duplicates) |
| API input validation | No input crashes the server | Only tests happy path + few bad inputs |
| State machines | All transitions maintain invariants | Combinatorial explosion of states |
| Parsers | parse(print(ast)) == ast | Limited grammar coverage |
Resources