On December 22, 2025, between 10:17 and 13:40 UTC, part of our CEP experienced service disruptions affecting message delivery, data ingestion, and dashboards for some customers.
The incident was caused by a hardware-related issue on a high-grade server hosting part of our private virtualization infrastructure. The issue has since been fully resolved, and corrective actions have been implemented.
During the incident window, the following impacts were observed:
No impact was observed on SMS delivery.
The incident affected a limited subset of customers, depending on their usage at the time.
The incident was caused by a fault in the cooling system of a high-grade server used in our private virtualization infrastructure.
This cooling issue led to overheating, triggering a protective shutdown of a disk group on the affected server. As a result, the hypervisor abruptly lost access to multiple disks, causing all virtualized services hosted on that node to stop simultaneously.
Although this class of hardware is designed to provide strong reliability guarantees, this cooling failure resulted in the loss of a single hypervisor and exposed the impact of service colocation on shared infrastructure.
Following this incident, we took the following actions:
These actions aim to reduce the impact of similar infrastructure-level incidents in the future.
We apologize for the disruption this incident caused.
While hardware failures of this nature are rare, this event highlighted areas where we could further improve infrastructure resilience and observability.
We remain committed to transparency and continuous improvement.
—
The Batch Engineering Team