Outage Incident Report – 11/24/2020

Incident Number: 20202411

Incident Type: Internet Circuit Outage

Internal Ticket Number: CO8111TSD-2624

Description of Incident

At 4:54 P.M. MST on Tuesday, 11/24/2020, IT staff reported that our service provider dedicated internet circuit was down. This downed circuit impacted Colorado 811 systems, most notably, parts of the phone system, iDig811, and company email. IT staff placed a trouble ticket with our service provider and began further troubleshooting locally.

At 5:15 P.M. MST on Tuesday, 11/24/2020, the IT Department updated staff that the downed circuit had affected other applications such as Excavator Expedite and Update Lite. IT Staff continued to troubleshoot over the phone with our internet service provider. A member of IT Staff also visited the CO811 office to check the building and server room for physical damage.

At 6:20 P.M. MST on Tuesday 11/24/2020, IT staff rebooted the AT&T internet Cisco routers per AT&T’s instructions. Applications such as iDig811, Excavator Expedite and Update Light failed to restore after the router resets. Colorado 811 employees continued to be able to access email and the phone system over VPN.

At 6:32 P.M. MST on Tuesday, 11/24/2020, AT&T began conducting invasive testing on the CO811 circuits in Golden. Around the same time, IT Staff were able to direct excavators to our B server in Virginia, which gave full access to the application to external users not on VPN.

At 8:45 P.M. MST on Tuesday, 11/24/2020, Our software provider was able to securely tunnel into our A server and address any tickets that we delayed due to the downed internet circuit. The tickets were then queued up for delivery on the B server. Operational staff were instructed to process tickets on the back-up production server until otherwise instructed. Ticket delivery was suspended on the main production server until services were to be later restored and fully functional. Service technicians continued to troubleshoot and perform invasive testing. No estimated time of restoration was provided by the internet service provider.

At 6:01 A.M MST on Wednesday, 11/25/2020, IT staff updated employees after working over-night with our service provider and software vendors to further troubleshoot the downed internet circuit. AT&T had begun to focus their troubleshooting on the LEC or Local Exchange Carrier. Operational staff were reminded to continue processing tickets on the B server until services were fully restored. IT staff then began visiting each agent’s workstation remotely to make configuration changes so that they could work off the back-up B server more efficiently.

At 11:20 A.M. MST on Wednesday, 11/25/2020, a workaround to be able to view email was provided to CO811 staff. Employees could then download and respond to emails and attachments while internet services were down. The IT Department continued to manage available back-up services with the assistance of support and administrative team members.

At 3:19 P.M. on Wednesday, 11/25/2020, services that caused the dedicated internet outage were restored by the service provider after the issue was escalated to the highest level. The I.T. Department remained in contact with service provider to determine why the internet circuit had been made “Administratively Down”. The service provider offered no explanation for the outage other than it was an administrative outage that is often the result of maintenance or billing issues among other reasons. Affected applications such as email, Service Desk, and excavator services were restored at this time. IT staff waited for services to remain stable before reverting ticketing and delivery back to the main production A server. The IT Department worked with Operational support staff to prepare to revert employees back to original workstation configurations.

Related Posts