Database issues impacting Push & In-app delivery
Incident Report for Batch
Postmortem

Here are some more details about the incident:

Timeline

All times are UTC in 24h time

At 09:31, we started a maintainance operation on our database to add more capacity.

Soon after, we got multiple reports of misbehaving features: test pushes not working, missing data in the "Debug" view, lower campaign volumes, etc.

At 11:58, we found out that queries were being served by our new, empty, database servers.

At 12:05, we disabled those servers. At this point, the system was back to normal.

We then started making application changes to make sure that this could not happen again and finished at 13:45.

Impact

Between 09:31 and 12:05, the system was sometimes not able to read custom data (from both SDK and API) and push tokens on installations. This happened randomly.

This affected:

  • Test pushes
  • Debug
  • Message personalization
  • Estimated reach
  • Transactional notifications if recipients were Installation IDs/Custom IDs.
  • Push Campaign, Automation and In-App Message targeting:

    • Some users were not targeted even if they should have been
    • Some users might have been targeted even if they should not have if there was any "has any value = no" targeting

As the database was in an inconsistent state, we are unable to tell which queries worked and which did not. The same user could be targeted as expected in a campaign, but fail in another.
This resulted in lower sent pushes per campaign. Those messages will NOT be retried.

Profiles were not affected: email automations worked as expected. Cappings worked as expected too.

Posted Apr 19, 2023 - 16:25 UTC

Resolved
This incident has been resolved.
Posted Apr 19, 2023 - 13:03 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Apr 19, 2023 - 12:25 UTC
Investigating
We're currently experiencing some issues with one of our database cluster impacting campaigns delivery and integrity.
We will update you as soon as we have more information and the next steps.
Posted Apr 19, 2023 - 12:25 UTC
This incident affected: API (Transactional API, Campaigns API) and Delivery (Transactional push, Push campaigns, In-app messaging).