Network Issues that Scared Network Engineers!

It’s Halloween week and what better way to jolt a network engineer than with network horror stories! While we have had numerous close encounters with network issues, we think these three will really chill you to the bone!

 

 

The Configuring

Location: A global banking organization

The Conjuring

The network team at a global banking organization scheduled a maintenance window after midnight on a weekend. Among the planned changes were updates to the BGP routing tables and route-maps on border routers at six locations. Sometime during the maintenance window, the bank started receiving complaints that its website and online banking facilities were inaccessible, and also about failed debit and credit card transactions around the world. The bank immediately put together a team to investigate the outage, but about 40 minutes later the issue resolved by itself!

Routing issues are usually ghostly, most lasting for only a few minutes to a couple of hours. Catching one in the act is extremely difficult. After hours of post-incident analysis, the team realized that changes made to the BGP routing table on a Cisco ASR caused the issue.

The Cisco ASR is designed to auto-save route updates as each command is entered, which resulted in new routes being advertised to the DMZs before the route-map was applied. This resulting route leak directed the online transactions to the bank’s internal network, from where they were then dropped.

To keep route leaks at bay, the bank invested in Packet Design Route Explorer to monitor and receive real-time alerts about changes in routes and route advertisements. The network team also uses Packet Design to capture and analyze the history of routing path changes to troubleshoot routing issues that last only a few minutes and resolve by themselves, often before a ticket is opened. There are details we left out so as not to scare you, but if you are still interested to know more about what happened, we have the entire case documented here:

http://www.packetdesign.com/wp-content/uploads/2017/04/Multinational-Banking-Corporation-Invests-in-Route-Analytics-to-Avoid-Outages.pdf

28 Calls Later

Location: A large network operator

Another close encounter involved a large network operator providing MPLS VPN services to hundreds of customers. Whenever there was a VPN connectivity problem, its customers would call to complain. In many instances, by the time tickets were raised, the connectivity issue fixed itself. This made it almost impossible to trace the history of routing paths and find the root cause.

Even in cases where the curse was ongoing, it took time to pinpoint the source of network issues via the team’s existing troubleshooting method. A network engineer had to log into each router, run traceroute, find the route hops, and repeat the process at each hop. This meant they either closed tickets without a resolution or received repeat calls for the same undead problem, leading to SLA violations.

The network service provider needed the ability to quickly trace real-time routing paths as well as look back in time and see historic routing paths. This would help them pinpoint the cause and location of the issue and resolve it before repeat calls are logged.

The network team deployed Packet Design Explorer Suite to record network topology and routing paths in real time and collect network performance metrics from monitored devices. The Explorer Suite also saves the recorded routing paths, which they can play back on the network topology map and use for historic routing analysis.

Now, whenever a VPN connectivity failure is reported, the network ghost-busting proton “Explorer” pack enables the network team to quickly see real-time routing information, the hops along the path, and the performance metrics at each hop along the route. Even when service calls are made after the problem has resolved, the team uses the Path Playback feature to go back in time and find the history of the routing path. With the Explorer Suite, catching MPLS VPN network issues is now child’s play for the NOC team.

Routing Path on Topology Map

                                                            Routing Path on Topology Map

The Cisco Line Card Massacre

Location: A U.S service provider

The Texas Chainsaw Massacre

We caught this potential nightmare before it brought down the network. A service provider had scheduled a maintenance window to upgrade the IOS on all its Cisco ASR routers. The upgrade went without a glitch, and network engineers checked each of the routers to ensure that all interfaces were up and had Tx and Rx traffic.

The network team then used the Packet Design Explorer Suite’s routing comparison report, which compares the state of the network elements to provide a before and after view. The report showed that on one of the routers, four ports that were up before the IOS upgrade were now down. But the router showed that all ports were up!

On further analysis of the paranormal activity, the service provider found that an entire 4x10G line card connecting two backbone links and two PoPs had disappeared from the view of chassis. The data from the Packet Design Explorer Suite thus helped the service provider detect the network issue before it brought down their network.

We hope these tales from the crypt have given you enough jitters for Halloween! If you need more, check out our blogs and case studies. Don’t forget to share your own network horror stories with us. And if you would like to talk to our ghost buster team to find out how Packet Design technology can help you keep network ghouls at bay, fill in the form below:

Request Demo