Degraded Performance - Email Processing Delays
Incident Report for Area 1 Security
Postmortem

Event Details:

On May 19th, 2020 between 14:15 UTC and 15:35 UTC, certain subsystems of Area 1 experienced sporadic mail delays for a short period of time.

Root Cause & Actions

The delay was due to our image analysis system not terminating properly after processing. The issue affected roughly 15% of our email processing infrastructure with all other nodes unaffected by the incident. The team was alerted to increased queue sizes on the affected processing nodes and took immediate action to resolve the issue. About 6.6% of total email messages delivered to the Area 1 system experienced a delay of over 1 minute. No message took longer than 14 minutes to be processed during the incident. All other messages were processed within an acceptable time range. All messages were successfully queued so no messages were lost during the event.

Additionally, the Area 1 engineering team has developed and deployed a solution to ensure our image analysis system does not cause message processing delays in the future. We have also adjusted our processes to better alert customers while the event is occurring vs. after the event.

Posted May 27, 2020 - 16:48 UTC

Resolved
The Area 1 Email service experienced degraded performance on 15% of our infrastructure that resulted in some delays in email delivery. The event occurred between 10:15 EDT // 14:15 UTC and 11:35 EDT // 15:35 UTC. The Area 1 team has resolved the issue. Analysis of the issue is underway and will be provided as a post mortem in the coming days.
Posted May 19, 2020 - 15:48 UTC