The overall impact of this incident was as follows:
On the 2021-04-15, 09:00 UTC we made a routine change to our message queuing system which had an unforeseen effect on the delivery of campaign web push notifications.
Delivery was slowed down significantly up to 09:40 UTC. At this time we were still diagnosing the problem.
At 09:40 UTC, we decided to restart another component which plays a role in the delivery of campaign push notifications because what we were seeing reminded us of another problem we've seen in the past.
Unfortunately, as soon as this component was restarted it failed due to an obsolete configuration. This meant no campaign push notifications, on all platforms were sent at this time.
Due to an obsolete deployment process for this component we had trouble bringing it up again but eventually it was up and running again around 10:10 UTC.
While we were troubleshooting this component, a quick fix was developed for the campaign web push delivery.
After both component were redeployed, all notifications were sent correctly and in a timely manner.
This incident highlighted two things:
In the future we will continue working on our process and tooling to improve these two points.