004 - Async Communication
ADR Metadata ACCEPTED
Date
2024-01-01
Deciders
Architecture Team
Context
Section titled “Context”The MRI system must be resilient to load spikes and third-party failures. Synchronous HTTP calls between internal services create temporal coupling—if one service is slow/down, the caller blocks. This leads to cascading failures and poor user experience.
Decision
Section titled “Decision”We will use Asynchronous Communication for all internal service-to-service interactions.
- Buffers: Use Amazon SQS for work queues (Inbound Queue, Worker Queues).
- Events: Use Amazon EventBridge for domain events (e.g.,
WorkflowCompleted,StepFailed) to trigger side effects like notifications.
Consequences
Section titled “Consequences” ✓ Positive Consequences
- Decoupling: Services operate independently at their own pace.
- Throttling: Queues act as buffers, flattening load spikes.
- Reliability: Built-in retry mechanisms and DLQs in SQS.
✗ Negative Consequences
- Complexity: Harder to debug flow (requires distributed tracing/X-Ray).
- Eventual Consistency: Data is not immediately available everywhere; the system must design for this.