Event-Driven Architecture: Patterns and Trade-offs
#architecture#event-driven#kafka#messaging
Event-driven architecture (EDA) decouples producers from consumers, enabling systems that are more scalable, resilient, and adaptable to change. But EDA introduces its own complexity -- eventual consistency, ordering challenges, and debugging difficulty.
Core Patterns
Pub/Sub (Publish-Subscribe)
Producers publish events to a topic; multiple consumers subscribe independently.
- Loose coupling between services
- Easy to add new consumers without modifying producers
- No guarantee of processing order across consumers
Event Sourcing
Instead of storing current state, store the sequence of events that led to it.
- Complete audit trail by design
- Ability to rebuild state at any point in time
- Enables temporal queries ("what was the state on March 1st?")
- Trade-off: read models must be projected, storage grows over time
CQRS (Command Query Responsibility Segregation)
Separate the write model (commands) from the read model (queries).
- Write model optimized for consistency and validation
- Read model optimized for query performance
- Often combined with event sourcing
- Trade-off: two models to maintain, eventual consistency between them
Message Broker Comparison
| Broker | Model | Throughput | Ordering | Retention | Best For |
|---|---|---|---|---|---|
| Apache Kafka | Distributed log | Very high | Per partition | Days to forever | Stream processing, event sourcing |
| RabbitMQ | Message queue | High | Per queue | Until consumed | Task queues, RPC patterns |
| AWS SNS/SQS | Pub/Sub + Queue | High | FIFO optional | 14 days (SQS) | AWS-native event routing |
| GCP Pub/Sub | Pub/Sub | High | Per key (ordering) | 31 days | GCP-native event pipelines |
| Redis Streams | Append-only log | Very high | Per stream | Configurable | Low-latency, simpler use cases |
Delivery Guarantees
| Guarantee | Description | Complexity | Use Case |
|---|---|---|---|
| At-most-once | Fire and forget, may lose messages | Low | Metrics, non-critical logs |
| At-least-once | Retry until acknowledged, may duplicate | Medium | Most business events |
| Exactly-once | Each message processed exactly once | High | Financial transactions |
Exactly-once is often achieved through idempotent consumers rather than true exactly-once delivery. Design consumers to handle duplicates safely.
Event Schema Evolution
As systems evolve, event schemas change. Strategies to manage this:
- Schema registry (Confluent, AWS Glue) -- centralized schema management with compatibility checks
- Backward compatibility -- new consumers can read old events
- Forward compatibility -- old consumers can read new events
- Versioned events -- include version field, consumers handle multiple versions
- Upcasting -- transform old events to new format at read time
Key Design Decisions
| Decision | Option A | Option B | Guidance |
|---|---|---|---|
| Thin vs fat events | ID + reference only | Full payload included | Fat events reduce coupling but increase payload size |
| Event vs command | Notification of what happened | Instruction to do something | Events for decoupling, commands for explicit orchestration |
| Shared vs dedicated topics | All events on one topic | One topic per event type | Dedicated topics for high-volume or sensitive events |
| Sync vs async | Request-response | Fire-and-forget | Async by default, sync only when immediate response required |
Common Pitfalls
- Event storms -- cascading events overload the system; use circuit breakers and backpressure
- Ordering assumptions -- distributed systems do not guarantee global order; design for out-of-order delivery
- Ghost events -- events referencing data that no longer exists; include sufficient context in events
- Debug hell -- tracing a request across 10 services requires correlation IDs and centralized logging