Stitch Expired Payment Webhooks delay due to Degraded Microsoft Azure Redis Infra
Incident Report for Stitch
Postmortem

Stitch Expired Payment Webhooks delay

Incident Description

Stitch payment expiry workers experienced degraded performance due to an outage of Microsoft’s Redis technology. More here.

Involved Parties

  • Platform reliability engineers and service deployment team.

Actions Taken

  • An immediate assessment of affected services and products was conducted.
  • Immediate mitigation measures where explored.

Root Cause

Microsoft Azure’s hosted Redis infrastructure experienced severe degraded performance which is linked here.

Resolution

Microsoft resolution: here

Preventative Measures

  • Redundancy measures are actively being explored for 3rd party services.

Conclusion

Microsoft Azure’s Redis infrastructure experienced severe downtime resulting in a delayed processing of status changes for payments - in particular payments that relied on the expiry work (ie. PENDING → EXPIRED). After some system analysis it was determined and confirmed that Microsoft was experiencing an outage that resulted in the delayed dispatches of the above webhooks.

Posted Mar 16, 2023 - 17:35 SAST

Resolved
This incident is now resolved and we apologise for any inconvenience this may have caused. All systems are back online with all features back to being fully operational.
Posted Mar 16, 2023 - 12:00 SAST