Entropy: 2. lack of order or predictability; gradual decline into disorder.
OK, so you need to add a new BGP peer and get it active quickly. You know that the Bismarck router has a configuration that has been running for a year and has been reliable, so you copy it for the new router, copy over all the route maps, and make sure they are applied to the interface. Oh, wait, there’s one route map that already exists so you don’t need to copy it over. Apply and save the config and behold, the peer comes up! Job done. Move on to the next issue.
You have just made one of the more common mistakes in networking: Copying a working configuration to a new router in the hope that it will work the same way. And it might, and sometimes this can work well… but not always. What happens when it’s time to add a new L3VPN customer? You copy a known config. But did you change the route distinguisher (RD) in the config correctly and set the VRF to the new one? Did you get the right route map or make a map typo when you added the new rules and renamed them? Once renamed, did you change the name of the route map that was linked to the customer? Maybe you should have just created a whole new config!
And that is the problem with using config snippets. We always start with a known good snippet, and then we modify it for a new use. But how do you know if you have taken all the steps necessary to make this one unique and compatible with its current situation?
This is how network entropy begins. I once analyzed a network that had about five thousand TE tunnels in it. The standard config was for each TE tunnel to have a primary, secondary, and fast reroute. However, there were a thousand tunnels that had no secondary route signaled, and 1,500 that had no FRR configured. At some point, either the original config snippet was corrupted and then copied, or the definition of a standard TE tunnel changed or was lost, or… who knows?
The fact is, this company’s network was not configured the way they wanted it. If a link failed, maybe FRR would fire but no secondary route take over. Or, since there was no FRR, the secondary route might take over but a lot of traffic would be dropped during the switch-over. Worse, maybe neither would work and traffic would simply follow IGP to the destination. That might be okay, or it could result in traffic saturating the very links that the tunnel was designed to bypass. With the mingled traffic, the VoIP calls that the tunnel carried might be more garbled.
Here’s another example. I once worked with a gaming company that had been receiving a lot of customer heat due to poor bandwidth. They finally got a much larger direct peering pipe on one of their two peering routers and were eager to make it operational before the evening’s gaming traffic ramped up. After configuring the local pref to prefer the new link, the engineers went home. Unfortunately, they had another terrible night. While the config was correct, the routers did not update the local pref. Traffic flowed over the old link and performance was no better than it had been. Later that night, they did a BGP reset on the larger pipe, the local pref went into effect, and the pipe was immediately in use. Unfortunately, they dropped hundreds of gamers during the transition. Not great way to improve customer satisfaction.
Generally, the right approach is to ask the router. You also need to know what your standard is and you should match each command that you enter to that standard. If you need to have a local pref changed, you should examine all the affected routes to ensure each router accepts the change and updates the RIB. Is something stuck? Did the new TE tunnel come up and if so, does it have all the expected standards for FRR and one or more secondary routes? If not, you need to go back to the drawing board, or even back out the change. If you don’t, you will have introduced one more element of entropy to your network.
And, before long, someone will copy your config snippet to use for their next change…