On July 23, 2024, Zendesk experienced a service incident affecting customers on Pod 29. From 10:58 UTC to 14:57 UTC, users faced issues accessing Zendesk products, including the Admin Center, through the Product Tray. Approximately 1% of customer requests returned 503 errors when accessing authenticated features within Guide, Talk, Chat, Explore, and Support. Users were unable to switch between products due to errors in the Product Tray and main web browser page.
The incident was caused by a new feature rollout that increased requests to the internal permissions service, leading to capacity saturation and network failures. The issue was resolved by increasing database capacity and rolling back the feature change. For more details, you can visit theoriginal link.
Zendesk resolved the service incident on July 23, 2024, by initially increasing the capacity of the permissions service’s database instance. This provided a short-term recovery while the root cause was being identified. Once the root cause was…
The root cause of the Zendesk incident on July 23, 2024, was a new feature rollout related to managing team members' permissions. This feature allowed agents in custom roles to manage other team members and their role assignments. The rollout led…
To prevent future incidents similar to the one on July 23, 2024, Zendesk planned several remediation items. These include reducing network traffic from permissions checks, which is currently in progress, and scheduling additional monitors and…
During the Zendesk incident on July 23, 2024, customers on Pod 29 experienced several errors. These included the inability to access Zendesk products through the Product Tray and receiving 503 errors when accessing authenticated features within…