Postmortem -
Read details
Nov 12, 16:29 UTC
Resolved -
The system has been functioning properly since our last communication, and we now consider this incident resolved.
Summary of Impact:
• Push Campaigns: Delayed by up to one hour.
• Transactional Push Notifications: Delayed by up to two hours.
• APIs: Experienced a 16% error rate across all services, except the Custom Audience API, which continued to encounter errors until Nov. 8th, 09:31 GMT+1.
• Successful API calls (returning a success status code) were enqueued but not processed during the incident.
• Processing of enqueued requests began around 23:00 GMT+1 and concluded by Nov. 8th, 00:40 GMT+1.
• Action Required: Retry any important failed API calls, as they were not enqueued.
• SDK Web Services:
• In-App Automations with “Re-evaluate targeting just before display” did not function as expected.
• Events, attribute updates, and push opens from the mobile SDK and plugins will be retried when users reopen their apps.
• Events, attribute updates, and push opens from the Web SDK have been partially lost.
Analytics and Tracking Limitations:
To restore campaign functionality as quickly as possible, we temporarily disabled internal tracking of push, email, and SMS deliveries between 18:40 GMT+1 and 23:20 GMT+1. As a result:
• Analytics for messages sent during this period are unavailable and cannot be recovered.
• Open rate percentages for this timeframe are unreliable.
• Marketing pressure features (Global Frequency, Label Frequency, and Recurring Automation Cappings) do not account for push, email, or SMS deliveries during this interval.
We are exploring ways to partially regenerate missing analytics data.
Next Steps:
Our team is preparing a comprehensive postmortem, which we plan to publish next week.
We apologize for the inconvenience caused and appreciate your understanding.
Nov 8, 13:26 UTC
Update -
The Custom Audience API encountered errors until November 8th, 9:31 GMT+1. It is now working as expected.
We will send an update later today with a more information about the impacted components. A full post-mortem is planned for next week.
Nov 8, 09:23 UTC
Monitoring -
The previous operation was successfully completed. SDKs and APIs are functioning correctly. Data ingestion has been back online since 22:45 GMT+1.
From 18:47 GMT+1 to 23:20 GMT+1, no analytics data was collected, and unfortunately, we will not be able to recover this data. As a result, you may notice abnormal open rates, as messages were sent during this period but acknowledgment information was not collected. Push notifications won't show up in the Inbox feature either.
Our teams are continuing to monitor the situation.
We will publish a post-mortem next week.
Nov 7, 22:33 UTC
Update -
To prepare our platform for the upcoming operation, we will temporarily suspend all API and SDK web services. During this time, data ingestion will not be possible (you will receive HTTP 500 errors) .
We will inform you as soon as data ingestion is restored.
Analytics are sill unavailable.
We will post another update in an hour.
Nov 7, 21:01 UTC
Update -
Our teams are still working on a complete solution.
We will post another update in an hour.
Nov 7, 19:48 UTC
Update -
The workaround is now also implemented to resume Transactional Push. Transactional Push will now be sent again, and any queued push that were delayed are being delivered progressively.
Our teams are still working on a complete solution.
Nov 7, 18:48 UTC
Update -
The workaround is now also implemented to resume Email & SMS campaigns. Email & SMS will now be sent again, and any queued Email & SMS that were delayed are being delivered progressively.
Due to this workaround, success and error analytics will not be available on the dashboard, APIs, or exports.
Our teams are still working on a complete solution.
Nov 7, 18:19 UTC
Update -
We are continuing to work on a fix for this issue.
Nov 7, 18:18 UTC
Identified -
We have located the root cause but are still working on exactly what components are affected.
We have implemented a workaround to resume APNS, FCM, and Web Push notifications for campaigns. Notifications will now be sent again, and any queued notifications that were delayed are being delivered progressively.
Due to this workaround, success and error analytics will not be available on the dashboard, APIs, or exports.
Our teams are still actively working to fully restore the remaining affected services.
Nov 7, 17:47 UTC
Investigating -
We are currently experiencing technical issues since 17:40 GMT+1. Notifications, Email, SMS, our API, and SDK web services are all down.
Our team is actively investigating the situation to restore the service as quickly as possible. We will keep you updated as soon as we have more information.
Nov 7, 17:09 UTC