Internet and Inter-Domain Analysis and Troubleshooting
Route Explorer's BGP Root Cause Analysis capability provides network managers a way to identify and diagnose complex BGP issues affecting mission-critical Internet or inter-domain connectivity. BGP Root Cause Analysis provides macro-level visibility and automated analysis of the causal events that can trigger millions of BGP routing updates, significantly decreasing mean time to repair (MTTR) and increasing service uptime. One of the tools available within the BGP Root Cause feature is a Root Cause Animation feature, that shows a dynamic topology visualization of the macro-level dynamics that are indicated by the raw BGP event stream. Upon selecting an event timeline of interest and launching the animation tool, the user can play, slow-down, fast-forward, and rewind the animation to view how the multi-domain peering structures and routes changed over time. A multi-domain map and graphic representation of route volume per router peering provides insight into how external peers and next-hop peers have affected IBGP peers and overall routing behavior. Isolation of root-cause events such as peering flaps, MED (Multi Exist Discriminator) oscillations, misconfigured community tags and unwanted back-door paths is performed in minutes rather than days.
The animations can be saved in SVG (Scalable Vector Graphics) format, a W3C standard for producing high quality graphics. Adobe has a free SVG browser plugin available for download. Download Adobe SVG plugin .
Below is a brief guide to the graphic representation of the network in the BGP Root Cause Analysis animation:
The thickness of a peering indicates how many prefixes are routed over that peering, rather than how much traffic is flowing.
Link colors indicate how the routes are changing:
Black means the routes are not changing
Blue means the peering is losing prefixes
Green means the peering is gaining prefixes
Yellow means the prefix count is flapping too fast to animate
A peering that has lost prefixes also has a gray shadow that indicates the largest number of prefixes it ever carried.
The Route Explorer is shown as a rectangle on the left. It (passively) peers with all the site's BGP edge routers (or core route reflectors if used), exactly as an interior router would iBGP peer with them. (I.e., the recorder's view of the BGP information is exactly the same one seen by all members of the site's iBGP mesh.)
How to use the SVG animation:
At the bottom left of the window is the animation clock (what point in the timeline of the event is currently being shown). Below it is a large "Start/Pause" button (click it once to start the animation and again to pause). Below that are buttons that take you to the beginning or end of the animation. To the right of the Start button are animation speed controls: the center square selects "normal" speed (a value built into the animation at the time it was created). Each click on the upper triangle will double the current speed and each click on the lower will halve it. Below the speed controls is a button that toggles between one-shot and continuous loop playback mode. The plot to the right of the controls shows how the prefixes varied with time on whichever link is selected in the topology graph (most animations have an "interesting" link selected at startup but click on any link to select it). Click on any point in this graph to take you to that time in the animation. To the right of the plot is various information about the currently selected edge.
Detecting BGP Failover and Slow Convergence from High Volume BGP Updates BGP issues can generate an overwhelming number of routing updates, that are beyond human ability to analyze effectively in a timely manner. This application example shows how Route Explorer can greatly simplify the root cause diagnosis of a large volume of BGP routing updates, resulting in more rapid response to critical errors or more proactive network optimization. The example shows an animation of the U.C. Berkeley network's BGP routing, when a 500,000 event incident occurred. During this incident, 30,000 prefixes failed over twice from CalrenN-Qwest to Level-3 via a sub-optimal 6 AS-hop backup path. Convergence time was very long--twenty minutes for each of the fail-overs, and one minute for the fail-backs. Without Route Explorer, it could take hours of analysis to determine what happened. View SVG animation.
Diagnosing BGP MED Oscillations Floods of BGP updates caused by random routing behavior such as Multi-Exit Discriminator (MED) oscillations, can create an operationally disruptive level of BGP routing traffic, impairing even a large network. This application example shows how Route Explorer can diagnose a huge volume of BGP updates generated due to an actual MED oscillation at a Tier 1 ISP. The animation utilizes anonymized network numbering, and shows four core route reflectors--two in each of two PoPs. Both pairs of route reflectors, Core1-a/b and Core2-a/b, each have paths to 4.5/16 via AS2. Core1-a/b also have a path via AS1. The ISP is accepting MEDs from AS2 and Core1 has the better MED. Core2-a/b announce superior metrics then withdraw their AS2 route randomly and rapidly, on the average of every 10 microseconds (100,000 times per second each--the links are colored yellow since the event rate is too fast to animate. This flood causes Core1-a and Core1-b to randomly switch paths on the average every 10 milliseconds (100 times/second), a rate so rapid that it shows as blue flashes that occasionally happen during the animation which indicate that the instantaneous announce / withdraw cycles are happening in less than a millisecond. The animation shows 10 seconds of this issue, with a time scale in milliseconds. The actual event lasted for at five days, continuously, and accounted for 95% of the ISP's BGP traffic. In other words, this one prefix generated 20 times more iBGP traffic than all the rest of the Internet combined, making diagnosis extremely difficult. With Route Explorer's Root Cause Analysis capability, diagnosis and problem resolution can be effected within minutes of recording and analyzing the BGP routing updates. View SVG animation.