Delivery issues
Incident Report for Batch
Postmortem

On the 21/11/2020 06:35 UTC an alert fired pointing to some network issues on Zookeeper — which is a cluster maintaining our messaging services' data coherence and integrity.

What seemed to be at first some network hiccups with a small impact snowballed into a global incoherent state leading to severe data corruption. We were unable to send any push notifications — impacting the "Push Campaigns", "Transactional Push", and "Trigger Campaigns" features.

We started working on restoring the messaging cluster, trying to get back to a stable state, fixing corrupted data manually. After 3 hours, seeing that we still didn't have a clear view on a possible ETA, we decided to split our efforts and started deploying a new messaging cluster while continuing the recovery work on the original one.

Circa 12:45 UTC we switched the "Transactional Push" to the new messaging cluster, making this feature available again.

Around 14:10 UTC the original messaging cluster was fixed and available again which allowed us to restart the "Push Campaigns" feature's services and delivering campaigns.

15min after, with some additional efforts, we restored the "Trigger Campaigns" feature.

Posted Nov 23, 2020 - 14:36 UTC

Resolved
This incident has been resolved.
Posted Nov 21, 2020 - 14:41 UTC
Monitoring
Push campaigns are back online, we're currently monitoring the situation.
Trigger campaigns are still impacted. We will keep you posted about this.
Posted Nov 21, 2020 - 14:09 UTC
Update
Transactional push notifications are full back up since 13:00 UTC.
Our team is now focusing on fixing the delivery of campaigns sent from the dashboard or the Push Campaigns API. We will keep you posted here.
Posted Nov 21, 2020 - 13:53 UTC
Update
Transactional push is available. We're still working on push campaigns availability and we will keep you posted as soon as we have an update.
Posted Nov 21, 2020 - 12:42 UTC
Identified
We've identified the issue. Our messaging service enabling communication between our applications is down.
We're still trying to bring it back.

Don't try to restart your campaigns, or re-submit your transactional messages, once the system will be back online, they'll certainly will be sent twice.
Posted Nov 21, 2020 - 09:03 UTC
Update
We are continuing to investigate this issue.
Posted Nov 21, 2020 - 08:29 UTC
Investigating
We have delivery issues since 06:19 UTC. Campaign and transactional push may not be able to be delivered.
Posted Nov 21, 2020 - 08:29 UTC
This incident affected: Delivery (Transactional push, Push campaigns).