Database issues impacting Push & In-app delivery

Incident Report for Batch

Postmortem

Here are some more details about the incident:

Timeline

All times are UTC in 24h time

At 09:31, we started a maintainance operation on our database to add more capacity.

Soon after, we got multiple reports of misbehaving features: test pushes not working, missing data in the "Debug" view, lower campaign volumes, etc.

At 11:58, we found out that queries were being served by our new, empty, database servers.

At 12:05, we disabled those servers. At this point, the system was back to normal.

We then started making application changes to make sure that this could not happen again and finished at 13:45.

Impact

Between 09:31 and 12:05, the system was sometimes not able to read custom data (from both SDK and API) and push tokens on installations. This happened randomly.

This affected:

Test pushes
Debug
Message personalization
Estimated reach
Transactional notifications if recipients were Installation IDs/Custom IDs.
Push Campaign, Automation and In-App Message targeting:
- Some users were not targeted even if they should have been
- Some users might have been targeted even if they should not have if there was any "has any value = no" targeting

As the database was in an inconsistent state, we are unable to tell which queries worked and which did not. The same user could be targeted as expected in a campaign, but fail in another.
This resulted in lower sent pushes per campaign. Those messages will NOT be retried.

Profiles were not affected: email automations worked as expected. Cappings worked as expected too.

Posted Apr 19, 2023 - 16:25 UTC

Resolved

This incident has been resolved.

Posted Apr 19, 2023 - 13:03 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Apr 19, 2023 - 12:25 UTC

Investigating

We're currently experiencing some issues with one of our database cluster impacting campaigns delivery and integrity.
We will update you as soon as we have more information and the next steps.

Posted Apr 19, 2023 - 12:25 UTC

This incident affected: [OLD_STATUSPAGE_MODEL] Delivery (Transactional push, Push campaigns, In-app messaging) and API ([MEP] Campaigns API, [MEP] Transactional API).