The root cause of the Zendesk incident on July 2, 2024, was linked to performance challenges during an upgrade to their updated storage system. These challenges resulted in a lag in delivering timely updates due to issues processing queries for connection and subscription lifecycles. This led to storage system blockages and stalled transactions, impairing the performance of the system component responsible for managing data and real-time user interface updates.
On July 2, 2024, Zendesk experienced a service incident affecting Pods 17 and 18, where the "Accept Chat" button became unresponsive. This issue later spread, causing a "Couldn't connect to server" error for customers in multiple other Pods when…
Zendesk resolved the July 2, 2024 service incident by implementing a multi-pronged approach. They increased the size of database clusters across all pods and identified database locks and blocked transactions as the root cause. A fast fix was…
After the July 2, 2024 incident, Zendesk took several remediation steps. They removed database locks and cleaned up orphaned subscriptions. Additionally, they added Service Level Objectives (SLOs) for connection and subscription creation endpoints…
The Zendesk service incident on July 2, 2024, lasted from 08:10 UTC to 16:30 UTC. During this time, customers across Pods 17 and 18 faced issues with the "Accept Chat" button being unresponsive, and later, a "Couldn't connect to server" error…
During the Zendesk incident on July 2, 2024, several updates were provided. Initially, Zendesk acknowledged the issue with the "Accept Chat" button on Pods 17 and 18 and continued to investigate. They explored fixes and tested options to resolve…