Sometimes, a single alert can prevent hundreds of thousands of failed internet transactions. If you’re a financial services organization, the monetary value of those failed transactions might be millions of dollars. This is the story of a major banking and financial corporation that experienced an unusual forty-five-minute outage, and how the risk of it happening again has been mitigated by using route analytics technology.
First, a quick refresh on BGP, a topic I’ve blogged about before. When different domains or autonomous systems across the globe need to communicate with another, they use BGP – the routing protocol of the Internet. BGP is designed to work on trust. When a router advertises a new BGP route to a destination, its BGP neighbors accept the new route numbers and pass them along. This design simplifies the process of inter-domain routing and is the number one reason why BGP was adopted over other protocols and enabled the Internet to grow as fast as it did. But this trust-based design is also a major drawback of BGP. It allows hackers to advertise their own shorter routes to a destination causing traffic to be redirected through the new and “supposedly” shorter route, letting the hackers gain access to data or craft a DDoS attack.
For the bank in question, the design of BGP was one of the factors in an expensive network outage. During a weekend maintenance window, a new route number was inadvertently advertised to its neighbors before permit and deny policies for the route number was applied via a route-map. This route leak caused the traffic intended for the Internet to be routed to the bank’s internal network, resulting in failed transactions. While the bank’s network was configured to handle most disasters, this was one scenario the network team never saw coming.
The outage and corresponding monetary and customer satisfaction impacts could have been avoided if an alert mechanism had let the maintenance team know that route numbers intended to stay within the bank network were being advertised to the routers that direct traffic from the Internet to the bank’s customer applications. That is, if they had a system for route leak monitoring, this would not have happened.
Intrigued? Our latest case study explains the outage in more detail and how route analytics with route leak monitoring now helps the bank prevent a recurrence of the issue.