|

Abstract:
Making Sense of BGP
Tina Wong,
Van Jacobson, Cengiz Alaettinoglu
"Making
Sense of BGP" Animations
Viewing
the Animations
The animations
are in SVG (Scalable Vector Graphics), a W3C standard for producing high
quality graphics. Unfortunately SVG support is not yet standard
in most browsers and you may need to download a plugin to
view the animations. Adobe has a free SVG plugin available
for download here. We've used
it with a variety of browsers on Linux, Mac OS X and Microsoft.
The Apache Batik
project also has a nice standalone viewer called squiggle
that can be downloaded here. It's
written in java and runs on most any platform but the current
version seems to take more cpu than the Adobe viewer.
Layout
and Visual Cues
The animations
all use the same layout and visual cues:
-
Data
would flow left-to-right. (BGP information is flowing
right to left.)
-
The
thickness of an edge indicates how many prefixes are routed over that edge
not how much
traffic is flowing over the edge (this is a routing diagnostic
not a traffic diagnostic).
-
Link
colors indicate how the routes are changing:
- Black
means not changing
- Blue means the edge is losing prefixes
- Green
means the edge is gaining prefixes
- Yellow means the prefix count is flapping too fast to
animate
- An
edge that has lost prefixes also has a gray shadow that indicates the largest
number of prefixes it ever carried.
-
The
BGP recorder is the rectangle on the left. It (passively)
peers with all the site's BGP edge routers (or core route
reflectors for an ISP) exactly as an interior router would
iBGP peer with them. (I.e., the recorder's view of the
BGP information is exactly the same one seen by all members
of the site's iBGP mesh.)
-
The
recorder has links to each of the site's BGP edge routers
or route reflectors.
-
The
edge routers/route reflectors have links to each of the
"next hops" they're getting from their eBGP peers.
-
The
"next hops" have links to the AS they service.
-
Each
AS links to the next downstream AS on the way to some
prefix(es).
-
On
the far right side, leaf ASs connect to the prefix(es)
they advertise.
The graph
is pruned at some threshold (see discussion of TAMP in the
talk), e.g., a link is pruned if it carries less than 10%
of the prefixes being displayed, so full AS paths and prefixes
usually won't be shown for events that involve a significant
number of prefixes.
Controls
and Indicators
At the bottom
left of the window is the animation clock (what point in the
timeline of the event is currently being shown). Below it is
a large "Start/Pause" button (click it once to start the animation
and again to pause). Below that are buttons that take you to
the beginning or end of the animation. To the right of the Start
button are animation speed controls: the center square selects
"normal" speed (a value built into the animation at the time
it was created). Each click on the upper triangle will double
the current speed and each click on the lower will halve it.
Below the speed controls is a button that toggles between one-shot
and continuous loop playback modes.
The plot to the
right of the controls shows how the prefixes varied with time
on whichever link is selected in the topology graph (most
animations have an "interesting" link selected at startup
but click on any link to select it). Click on any point in
this graph to take you to that time in the animation. To the
right of the plot is various information about the currently
selected edge.
The
Animations
There are
three animations of data collected at UC Berkeley during August,
2003.
- This
shows an incident where 40,000 prefixes fail over from the
northern California CALREN peering with Qwest to the southern
California CALREN Qwest peering then come back. Convergence
relatively fast for BGP: one minute each way. The animation
shows roughly 220,000 BGP events. Note that 128.32.1.3,
Berkeley's primary path to the "commercial Internet" loses
routes when the CalrenN-Qwest peering drops. It wasn't supposed
to.
- This
shows another 220,000 BGP event incident where 30,000 prefixes
fail over from CalrenN-Qwest to CalrenN's alternate ISP
connectivity (Net Access via Global NAPs). Convergence is
slower than the previous incident (5 minute fail-over, 3
minute fail-back) probably because of the longer backup
path. 128.32.1.3 again loses routes when the CalrenN-Qwest
peering drops.
- This
shows a 500,000 event incident where 30,000 prefixes fail
over, twice, from CalrenN-Qwest to Level-3 via an amazing
6 AS-hop backup path. Convergence is awful (20 minutes for
each of the fail-overs, 1 minute for the fail-backs).
Note:
For the second and third incidents above, CENIC graciously
informed us that the failovers "were actually peers leaking
routes to the CalREN North and South networks, though it is
functionally the same as the routers see it."
There
are two animations of data collected at a tier-1 ISP. (All
IP and AS numbers in the ISP data have been anonymized. The
data is real but the addresses are fake.)
- This
shows a textbook example of MED oscillation but with a surprising
intensity. There are four core route reflectors involved,
two in each of two PoPs. Core1-a/b and Core2-a/b each have
paths to 4.5/16 via AS2. Core1-a/b also have a path via
AS1. The ISP is accepting MEDs from AS2 and Core1 has the
better MED. So Core1-a/b switch between the AS1 and AS2
paths as Core2-a/b announce/withdraw their AS2 route. In
this case Core2-a/b are each announcing and withdrawing
their AS2 route randomly on the average of every 10 microseconds
(100,000 times per second each - the links are colored
yellow since the event rate is too fast to animate). This
flood causes Core1-a and Core1-b to randomly switch paths
on the average every 10 milliseconds (100 times/second).
(The blue flashes that occasionally happen during the animation
are not a bug. They're times when the instantaneous event
rate is way more than the average and announce / withdraw
cycles happen in less than a millisecond.) The animation
shows 10 seconds of this (note that the time scale on this
animation is milliseconds while the others have been seconds
or minutes). The actual event lasted for at least five days,
continuously, and accounted for 95% of the ISP's BGP
traffic. I.e., this one prefix generated 20 times more iBGP
traffic than all the rest of the Internet combined.
- This
shows a different kind of oscillation. A customer of the
ISP has a direct connection via next hop 1.0.0.1 but the
associated BGP peering won't stay up -- it's dropped and
re-established every minute on the average. The customer
also has a backup link via a NAP that's connected to all
the other tier1 ISPs so when the one hop direct path goes
away things immediately fail over to a three hop alternate
via some other tier1. Since each pop peers with different
tier1's and each makes an independent decision, lots of
different alternate paths are announced. The convergence
details vary slightly event to event (depending on the relative
timing of each core route reflector's updates from the access
routers peering with the various downstream ISPs) but it
takes about 20 seconds for everything to converge and generates
about 200 BGP events per customer flap. This oscillation
went on continuously for more than a month. The event rate
is too low for most diagnostics to detect the problem but
the Stemming algorithm described in the paper had no trouble
finding it.
This page and the animations it links to are all Copyright
(c) 2004 by Packet Design, Inc.

©
2004. Packet Design Inc. |