Skip to content

004 - Async Communication

ADR Metadata ACCEPTED
Date

2024-01-01

Deciders

Architecture Team

The MRI system must be resilient to load spikes and third-party failures. Synchronous HTTP calls between internal services create temporal coupling—if one service is slow/down, the caller blocks. This leads to cascading failures and poor user experience.

We will use Asynchronous Communication for all internal service-to-service interactions.

  • Buffers: Use Amazon SQS for work queues (Inbound Queue, Worker Queues).
  • Events: Use Amazon EventBridge for domain events (e.g., WorkflowCompleted, StepFailed) to trigger side effects like notifications.
Positive Consequences
  • Decoupling: Services operate independently at their own pace.
  • Throttling: Queues act as buffers, flattening load spikes.
  • Reliability: Built-in retry mechanisms and DLQs in SQS.
Negative Consequences
  • Complexity: Harder to debug flow (requires distributed tracing/X-Ray).
  • Eventual Consistency: Data is not immediately available everywhere; the system must design for this.