Abstract: Making Sense of BGP
Tina Wong, Van Jacobson, Cengiz Alaettinoglu

"Making Sense of BGP" Animations

Viewing the Animations

The animations are in SVG (Scalable Vector Graphics), a W3C standard for producing high quality graphics. Unfortunately SVG support is not yet standard in most browsers and you may need to download a plugin to view the animations. Adobe has a free SVG plugin available for download here. We've used it with a variety of browsers on Linux, Mac OS X and Microsoft. The Apache Batik project also has a nice standalone viewer called squiggle that can be downloaded here. It's written in java and runs on most any platform but the current version seems to take more cpu than the Adobe viewer.

Layout and Visual Cues

The animations all use the same layout and visual cues:

  • Data would flow left-to-right. (BGP information is flowing right to left.)

  • The thickness of an edge indicates how many prefixes are routed over that edge not how much traffic is flowing over the edge (this is a routing diagnostic not a traffic diagnostic).

  • Link colors indicate how the routes are changing:

    • Black means not changing
    • Blue means the edge is losing prefixes
    • Green means the edge is gaining prefixes
    • Yellow means the prefix count is flapping too fast to animate
    • An edge that has lost prefixes also has a gray shadow that indicates the largest number of prefixes it ever carried.
  • The BGP recorder is the rectangle on the left. It (passively) peers with all the site's BGP edge routers (or core route reflectors for an ISP) exactly as an interior router would iBGP peer with them. (I.e., the recorder's view of the BGP information is exactly the same one seen by all members of the site's iBGP mesh.)

  • The recorder has links to each of the site's BGP edge routers or route reflectors.

  • The edge routers/route reflectors have links to each of the "next hops" they're getting from their eBGP peers.

  • The "next hops" have links to the AS they service.

  • Each AS links to the next downstream AS on the way to some prefix(es).

  • On the far right side, leaf ASs connect to the prefix(es) they advertise.

The graph is pruned at some threshold (see discussion of TAMP in the talk), e.g., a link is pruned if it carries less than 10% of the prefixes being displayed, so full AS paths and prefixes usually won't be shown for events that involve a significant number of prefixes.

Controls and Indicators

At the bottom left of the window is the animation clock (what point in the timeline of the event is currently being shown). Below it is a large "Start/Pause" button (click it once to start the animation and again to pause). Below that are buttons that take you to the beginning or end of the animation. To the right of the Start button are animation speed controls: the center square selects "normal" speed (a value built into the animation at the time it was created). Each click on the upper triangle will double the current speed and each click on the lower will halve it. Below the speed controls is a button that toggles between one-shot and continuous loop playback modes.

The plot to the right of the controls shows how the prefixes varied with time on whichever link is selected in the topology graph (most animations have an "interesting" link selected at startup but click on any link to select it). Click on any point in this graph to take you to that time in the animation. To the right of the plot is various information about the currently selected edge.

The Animations

There are three animations of data collected at UC Berkeley during August, 2003.
  • This shows an incident where 40,000 prefixes fail over from the northern California CALREN peering with Qwest to the southern California CALREN Qwest peering then come back. Convergence relatively fast for BGP: one minute each way. The animation shows roughly 220,000 BGP events.  Note that 128.32.1.3, Berkeley's primary path to the "commercial Internet" loses routes when the CalrenN-Qwest peering drops. It wasn't supposed to.
  • This shows another 220,000 BGP event incident where 30,000 prefixes fail over from CalrenN-Qwest to CalrenN's alternate ISP connectivity (Net Access via Global NAPs). Convergence is slower than the previous incident (5 minute fail-over, 3 minute fail-back) probably because of the longer backup path. 128.32.1.3 again loses routes when the CalrenN-Qwest peering drops.
  • This shows a 500,000 event incident where 30,000 prefixes fail over, twice, from CalrenN-Qwest to Level-3 via an amazing 6 AS-hop backup path. Convergence is awful (20 minutes for each of the fail-overs, 1 minute for the fail-backs).

Note: For the second and third incidents above, CENIC graciously informed us that the failovers "were actually peers leaking routes to the CalREN North and South networks, though it is functionally the same as the routers see it."

There are two animations of data collected at a tier-1 ISP. (All IP and AS numbers in the ISP data have been anonymized. The data is real but the addresses are fake.)

  • This shows a textbook example of MED oscillation but with a surprising intensity. There are four core route reflectors involved, two in each of two PoPs. Core1-a/b and Core2-a/b each have paths to 4.5/16 via AS2. Core1-a/b also have a path via AS1. The ISP is accepting MEDs from AS2 and Core1 has the better MED. So Core1-a/b switch between the AS1 and AS2 paths as Core2-a/b announce/withdraw their AS2 route. In this case Core2-a/b are each announcing and withdrawing their AS2 route randomly on the average of every 10 microseconds (100,000 times per second each - the links are  colored yellow since the event rate is too fast to animate). This flood causes Core1-a and Core1-b to randomly switch paths on the average every 10 milliseconds (100 times/second). (The blue flashes that occasionally happen during the animation are not a bug. They're times when the instantaneous event rate is way more than the average and announce / withdraw cycles happen in less than a millisecond.) The animation shows 10 seconds of this (note that the time scale on this animation is milliseconds while the others have been seconds or minutes). The actual event lasted for at least five days, continuously, and accounted for 95% of the  ISP's BGP traffic. I.e., this one prefix generated 20 times more iBGP traffic than all the rest of the Internet combined.

  • This shows a different kind of oscillation. A customer of the ISP has a direct connection via next hop 1.0.0.1 but the associated BGP peering won't stay up -- it's dropped and re-established every minute on the average. The customer also has a backup link via a NAP that's connected to all the other tier1 ISPs so when the one hop direct path goes away things immediately fail over to a three hop alternate via some other tier1. Since each pop peers with different tier1's and each makes an independent decision, lots of different alternate paths are announced. The convergence details vary slightly event to event (depending on the relative timing of each core route reflector's updates from the access routers peering with the various downstream ISPs) but it takes about 20 seconds for everything to converge and generates about 200 BGP events per customer flap. This oscillation went on continuously for more than a month. The event rate is too low for most diagnostics to detect the problem but the Stemming algorithm described in the paper had no trouble finding it.

    This page and the animations it links to are all Copyright (c) 2004 by Packet Design, Inc.

 

© 2004. Packet Design Inc.