July Outage Incident Report

Description of Incident

(Click here to download a PDF of the full incident report)

  • At 8:29 PM MST on Friday, July 24th, 2020, the Colorado 811 IT Department received a call from an after-hours agent stating they were having VPN connection issues and were unable to make outbound calls. Colorado 811 Systems Administrator instructed the agent to use a back-up VPN connection, which was functional at that time. Colorado 811 IT staff entered ticket 19244563 with our service provider to address the primary VPN failure. This ticket was picked up by the service provider’s Managed firewall team.
  • At 6:55 AM MST on Saturday, July 25th, 2020, Colorado 811 IT staff received an update from our service provider stating that the Managed Firewall team had opened a related internal ticket with the IP team to further investigate our IP routes. At this time, Colorado 811’s secondary VPN was still functional, and agents were able to receive phone calls.
  • At 1:00 PM MST on Saturday, July 25th, 2020, Colorado 811 Systems Administrator began receiving reports of secondary VPN failures from Operations agents. After troubleshooting the issue, it was determined that the primary VPN was functioning, and the secondary VPN was now failing. Colorado 811 agents were instructed to revert back to the primary VPN and the issue was noted in the trouble ticket.
  • At 5:30 PM MST on Saturday, July 25th, 2020, the Colorado 811 IT Department determined that our Golden site had lost VoIP (Voice over IP) services. This voice traffic outage caused Colorado 811 agents to not be able to receive or make phone calls. An outage message was posted to all staff on our internal collaboration platform and the Colorado 811 System Outage team was further notified. Website and social media outage alerts were posted by the Colorado 811 Marketing and Communications Administrator. Operational staff instructed agents to monitor Online Services and Ticket Express for locate requests. Colorado 811 Systems Administrator updated the service provider trouble ticket to reflect the sudden voice outage. A second CO811 internal trouble ticket was opened to track the voice network issue.
  • At 9:55 PM MST on Saturday, July 25th, 2020, service provider support informed Colorado 811 IT Department staff of a service outage on a voice circuit at our Roanoke site. The outage in Roanoke affected Colorado 811’s toll-free number block, which had rendered the 811 and 1-800 phone numbers unreachable.
  • At 1:00 AM MST on Sunday, July 26th, 2020, service provider technicians attempted to re-route Colorado 811 voice services to an un-affected circuit. Colorado 811 staff worked with service technicians to offer real-time feedback and testing of our Golden and Roanoke sites.
  • At 8:47 AM MST on Sunday, July 26th, 2020, service provider support stated that they had engaged the Electrical Response team. Service provider support further stated that there was an electrical malfunction alarm and that the issue needed to be addressed immediately. A second trouble ticket was created by the service provider: 19279891.
  • At 1:04 PM MST on Sunday, July 26th, 2020, Colorado 811 Systems Administrator finished a several-hour long conference call with the service provider. The meeting notes reflect that the service provider stated the voice circuit had been fully restored, however, Colorado 811 IT staff and service provider technicians were not able to resolve calls to 811 and the corresponding 1-800 number. The service provider re-engaged the Voice and IP teams for further troubleshooting. The Colorado 811 IT Department continued to provide feedback, testing, and independent troubleshooting throughout the afternoon and evening.
  • At 10:47 PM MST on Sunday, July 26th, 2020, service provider support stated that the Managed Firewall team would be working our trouble tickets over-night. The next available Managed Firewall technician was to be dispatched at 5:00 AM MST. An update would be provided to the Colorado 811 Systems Administrator at that time. Operational staff and agents continued to process after-hours locate requests via Online Services and Ticket Express. Excavators were regularly reminded to use these applications on the Colorado 811 website and social media.
  • At 6:00 AM MST on Monday, July 27th, 2020, Colorado 811 IT staff received an update from the service provider. The Service Assurance team had begun working with the Managed Firewall team and explored the possibilities of migrating our voice traffic to a new circuit. At this time, the service provider had still not provided a definitive explanation to why voice traffic was not reaching our phone system after the primary voice circuit’s restoration.
  • At 9:23 AM MST on Monday, July 27th, 2020, calls to the Colorado 811 phone system were routed to a recording that stated Colorado 811 was experiencing phone issues. The temporary recording replaced out-of-service messages and busy signals as well as directed excavators to the Colorado 811 website to process locate requests.
  • At 10:24 AM MST on Monday, July 27th, 2020, Colorado 811 IT Staff participated in a conference call with our service provider and data center provider. Status of network hardware including managed router and core switch was noted as in good working-order.
  • At 1:19 PM MST on Monday, July 27th, 2020, Colorado 811 Systems Administrator finished another conference call with our service provider and data center provider. The joint taskforce was able to determine that network routes programmed into Colorado 811 managed routers were functioning and all network equipment was working as designed. Our phone system server also seemed to be in good working order. The service provider was able to send and receive network traffic from their VoIP servers and on to the Colorado 811 phone system. However, SIP signaling, sent from the service provider’s VoIP servers to the Colorado 811 phone system was being lost. Voice traffic sent to the Colorado 811 phone system over a particular port was not arriving. Colorado 811 IT staff confirmed that the SIP (Session Initiation Protocol) traffic sent from the service provider was not arriving in-tact to the Colorado 811 phone system. Understanding these new developments, service provider support stated that the Managed Firewall team would work with the IP team to resolve the issue. Their work would help determine if the service provider had restricted outbound SIP traffic on a particular port from accessing Colorado 811’s phone system.
  • At 5:27 PM MST on Monday, July 27th, 2020, Colorado 811 IT Staff participated in conference call between our data center provider and service provider. This call allowed the service provider network engineers, phone system engineers, and Colorado 811 staff time to pinpoint the affected area of our network. Colorado 811 Systems Administrator worked with both parties to review routes and circuit access lists.
  • At 9:23 PM MST on Monday, July 27th, 2020, the service provider was able to properly configure managed services, including the network VPN that is vital to voice services. These actions restored voice traffic to the Colorado 811 phone system. The service provider VPN was deactivated during initial troubleshooting and caused the service provider to not pass voice traffic to the Colorado 811 phone system. After the fix, callers were able to reach 811 and the corresponding 1-800 number to process locate requests. All Colorado 811 staff, to include the System Outage team, were made aware of the phone system resolution. Colorado 811 IT staff monitored voice circuits and phone calls throughout the night. Website and social media outage posts were amended or removed by early the next morning.
Operational Impact

Internal Impact – On Sunday, July 26th, 2020, Emergency/Damage ticket volume was 48% of the average number of tickets processed on a Sunday in July 2020. On Monday, July 27th, 2020, total ticket volume was 72% of the average number of tickets processed on a Monday in July 2020.

External Impact – Excavators were able to process locate requests and damage notifications through all online applications.

Incident Resolution Date – 07/27/2020

Root Cause – A managed service was deactivated by the service provider during VPN troubleshooting. This change caused voice traffic not to be passed from the service provider to the Colorado 811 phone system. This impacted Colorado 811’s ability to receive and process phone calls to 811 and the corresponding 1-800 number.

 Click here to download a PDF of the full incident report.

Related Posts