Network Issue Impacting Message Delivery and APIs

Incident Report for Batch

Postmortem

What happened

On July 1, starting at 15:54 CET, a network malfunction disrupted connectivity for a portion of our bare‑metal infrastructure. As a result, message delivery (email, push, SMS) and several APIs suffered partial outage or degraded performance. The disruption persisted for roughly 2 hours, with full restoration by approximately 18:00 CET.

Why it happened

The incident was caused by two redundant network switches that failed to recover correctly after a power issue. Both switches had a firmware defect that prevented synchronization, leading to network partition across affected racks. The manufacturer has since identified and fixed the firmware bug, which is no longer in operation.

What we’ve done

In coordination with our hosting provider, we restored network connectivity and progressively brought all impacted systems back online. By approximately 20:54 CET, service levels had fully normalized. No customer data has been lost during the incident, though data ingested through our public APIs and SDKs during the window may not have been processed if not retried.

What we’re doing next

To improve resilience, we are enhancing our infrastructure monitoring to detect hardware-level network issues faster and with greater precision. We’re also continuously improving our recovery tools and incident response processes to streamline restoration during such events.

We sincerely apologize for the disruption and appreciate the trust you place in us. If you have any questions or would like further clarification, please feel free to reach out.

Posted Jul 29, 2025 - 08:56 UTC

Resolved

As of 21:39 CET, the incident has been fully resolved.

All services are back to normal, with no remaining latency or performance degradation. The platform is stable and operating as expected.

A detailed post-mortem will be published shortly to provide transparency on the root cause and the remediation actions taken.

We thank you for your continued trust and patience throughout this incident.

Posted Jul 01, 2025 - 19:40 UTC

Monitoring

As of 17:50 CET, all previously impacted servers have been brought back online.

All services are now operational. However, some functionalities may still experience degraded performance while we continue to actively monitor the platform.

Our engineering team remains fully mobilized to ensure complete recovery and platform stability.

We will provide a final update once the incident is officially resolved.

Posted Jul 01, 2025 - 16:00 UTC

Identified

As of now, the incident is still ongoing. We are currently waiting for our hosting provider to fully resolve the underlying network issue.

Impacted servers are progressively coming back online, which is a positive indication toward recovery.

Our team continues to closely monitor the situation and remains ready to take immediate action as soon as full service restoration becomes possible.

We will share another update by 18:36 CET, or sooner if we have meaningful progress to report.

Posted Jul 01, 2025 - 15:27 UTC

Investigating

Since 15:54 CET, we have been experiencing a network-related incident with our hosting provider.
As a result:

- A majority of outgoing messages (push notifications, emails, and SMS) are not being sent, across both our MEP and CEP platforms.
- All APIs are impacted and are currently showing an elevated error rate.
- The dashboard remains accessible, but its performance is significantly degraded.

Our engineering team is actively investigating the root cause and working to restore full service as quickly as possible.
We will provide an update by 17:36 CET or as soon as new information becomes available.

Posted Jul 01, 2025 - 14:42 UTC

This incident affected: MEP Core Services (Dashboard, Push delivery, In-app delivery, Data ingestion, Data analysis, Event Targeting & Retargeting, Marketing pressure, Custom Audiences, GDPR), Optional Services (Editorial dashboard, Inbox, Custom Exports, Webhook), CEP Core Services (Dashboard, Email delivery, Push Delivery, SMS Delivery, Data ingestion, Data analysis, Event Targeting & Retargeting, Marketing pressure, Audiences), and API ([CEP] Profile API, [CEP] Campaign API, [CEP] Export API, [CEP] Audience API, [MEP] Campaigns API, [MEP] Transactional API, [MEP] Custom Data API, [MEP] Transactional API for Partners, [MEP] GDPR API, [MEP] Export API, [MEP] Custom Audience API, [MEP] Trigger Events API, [MEP] In-App API).