Here are some more details about the incident:
All times are UTC in 24h time
At 09:31, we started a maintainance operation on our database to add more capacity.
Soon after, we got multiple reports of misbehaving features: test pushes not working, missing data in the "Debug" view, lower campaign volumes, etc.
At 11:58, we found out that queries were being served by our new, empty, database servers.
At 12:05, we disabled those servers. At this point, the system was back to normal.
We then started making application changes to make sure that this could not happen again and finished at 13:45.
Between 09:31 and 12:05, the system was sometimes not able to read custom data (from both SDK and API) and push tokens on installations. This happened randomly.
This affected:
Push Campaign, Automation and In-App Message targeting:
As the database was in an inconsistent state, we are unable to tell which queries worked and which did not. The same user could be targeted as expected in a campaign, but fail in another.
This resulted in lower sent pushes per campaign. Those messages will NOT be retried.
Profiles were not affected: email automations worked as expected. Cappings worked as expected too.