To prevent future service incidents, Zendesk is implementing several remediation items. These include improving recovery time by updating runbooks for earlier power cycle procedures, updating tools access for on-call engineers, introducing additional alerts to detect instance failures, and escalating the priority of Pod account migrations to reduce the impact radius.
On December 18, 2023, Zendesk experienced a service incident affecting Chat and Support (Messaging) across all Pods. The issues included chat and messaging errors, disconnections, login problems, and the inability to change agent status. The…
Zendesk resolved the chat and messaging outage by restarting an unhealthy Chat server. This action led to the recovery of the affected services. The team continued to monitor performance and worked to restore any recoverable historical chats that…
The root cause of the Zendesk service disruption on December 18, 2023, was a failure of a single live chat host in the hosting provider's infrastructure. This failure disrupted the chat and messaging services for customers served by this backend…
For current system status information about Zendesk, you can check their system status page. This page provides updates and summaries of post-mortem investigations a few days after incidents have ended. If you have additional questions, you can log…