Webhook Delivery
Incident Report for Stitch
Postmortem

Delayed Delivery of Webhooks

Incident Description

On March 9, 2023 at 8:15am, an incident occurred that resulted in delayed delivery of webhooks across multiple products. All clients were affected by the incident over the entire period.

Involved Parties

Platform reliability engineers and support members were involved in the investigation and resolution of the incident.

Actions Taken

An investigation was immediately launched by the above-named parties to determine the cause of the incident.

Webhooks dispatchers were restarted to clear out any blockages.

Root Cause

The dispatchers experienced a service outage without any warnings to our system monitors

Resolution

Restarting the down dispatchers resulted in webhooks firing at acceptable throughput

Preventative Measures

  • Introduced additional system monitors to directly monitor throughput
  • Improve monitoring and alerting of the webhooks dispatchers to ensure they are working evenly.
  • Expedite our internal investigation into dispatcher migration

Conclusion

This incident resulted in delayed delivery of webhooks across multiple products and clients. The cause of the incident was determined to be failed dispatchers. The issue was resolved by restarting the affected dispatchers manually. Preventative measures have been put in place to prevent similar incidents from occurring in the future.

Posted Mar 09, 2023 - 11:14 SAST

Resolved
Webhook service outage resulting in delayed webhook delivery.
Posted Mar 09, 2023 - 08:00 SAST