DNS access impacted
Incident Report for Area 1 Security
Postmortem

INCIDENT TIMELINE AND ACTIONS

Date / Time (Pacific Time) Event Action
09/05/2019 - 11:30 am PST Environment variable for production firewall rule assigned to the wrong network While fixing a bug and applying a new firewall rule for a customer configuration the service change occurred.
09/05/2019 - 11:51 am PST Area 1 monitoring systems alerted Area 1 personnel began investigation into reported issues and working on an action plan to get it addressed.
09/05/2019 - 12:16 PM PST Customer reports they are experiencing DNS issues Area 1 personnel alerted and respond to customer queries.
09/05/2019 - 12:30 pm PST Issue resolved All systems returned to normal state and behavior.

Analysis

Event

At approximately 11:30 am PST on September 5th 2019, Area 1’s DNS service experienced a behavior where customers began seeing sporadic failures in DNS resolutions. In cases where appropriate fail-over was configured, DNS requests began falling back to customer configured fail-over nodes.

Root Cause(s)

In the process of tightening Area 1’s security around firewall rule configurations, Area 1 operational personnel discovered that a specific cloud provider API does not behave as documented. As a result the bug fix to enhance the security of configuration rule changes inadvertently caused production firewall rules to be applied to the test environment. Consequently the intended fix removed the same rule from our production systems; thereby preventing customer ingress IPs to be blocked from resolving DNS queries.

Corrective Actions:

Area 1 immediately took several actions to address the issue immediately, and to prevent future recurrence.

Completed actions

  • Resolve ingress IP rules - Area 1 service personnel removed restrictions that were preventing customer ingress IPs from communicating with the service.

In process actions

  • Additional monitoring is being added to our firewall rule management to ensure it is configured properly at all times.
  • Enhanced Service logging is being integrated within our overall log management system for any such changes.
Posted Sep 09, 2019 - 15:32 UTC

Resolved
This incident has been resolved. We will follow with an RCA.
Posted Sep 05, 2019 - 19:40 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 05, 2019 - 19:31 UTC
Identified
The issue has been identified and a fix is currently propagating.
Posted Sep 05, 2019 - 19:30 UTC
Investigating
Some customers are not able to access the DNS service. We are currently investigating this issue.
Posted Sep 05, 2019 - 19:27 UTC