Many large enterprises use MPLS Traffic Engineering for connectivity to remote branches and data centers. For these enterprises, business continuity itself depends on maintaining resiliency during a failure. While a failure occurring in the underlying physical network can be overcome by using redundant hardware, a failure on a Label Switched Path (LSP) that caters to high-speed, low-latency applications requires quick and proactive resiliency mechanisms. In this blog series, we will look at two mechanisms that can be used in MPLS-TE networks for quick recovery from failure: End-to-End Protection or path protection enabled using a secondary path, and local protection provided using MPLS FastReroute (FRR).
We will explore the basics of End-to-End Protection in this blog post and cover MPLS FRR in part two.
End-to-End Protection, as the name says, provides failure recovery for the entire LSP that is carrying a TE tunnel’s traffic. This is achieved using two LSPs – the primary LSP, which is the active LSP that carries the TE traffic, and a secondary path, which acts as a standby path, ready to take over when the primary path fails.
In this mechanism, the secondary LSP is configured and established in advance. When a primary LSP fails, the headend router along the tunnel is alerted of the path failure using failure detection mechanisms that leverage RSVP signaling or IGP. The headend router then immediately switches the MPLS-TE tunnel’s traffic from the primary to the secondary LSP. Once the primary LSP recovers, traffic is switched back to it.
When enabling secondary paths for path protection, it is also necessary to ensure that both the primary and secondary LSPs use different paths. This provides resiliency by ensuring that both LSPs have no single point of failure. This method is referred to as path diversity and can be achieved using full strict hop LSP paths, Shared Risk Link Group (SRLG) or admin groups.
With a full strict hop LSP, the exact path to be taken by the LSP is identified and no label switched routers (LSR) are allowed to overlap between the two LSPs. Here, the exact order of the LSRs through which the RSVP messages are sent is specified. This way, the primary and the secondary LSP take two unique paths, and failure of an LSR along one path does the not affect the other. This method also brings with it more configuration overhead and adds to the complexity, especially in large networks.
The other mechanism for path diversity is SRLG. SRLG refers to a situation where links that share a common physical attribute (a common fiber) are considered to carry the same risk. If one link in a group fails, other links in the group may fail too, thus putting them all in a group with shared risks: Shared Risk Link Group or SRLG. MPLS path diversity is achieved over SRLGs by ensuring that the primary and secondary LSPs do not use links from the same SRLG. This ensures that when a primary LSP fails, the secondary LSP does not fail also.
Admin groups are another mechanism used to achieve path diversity and works exactly like SRLG. Here, just as with SRLG, links are assigned to different admin groups, and the primary and secondary LSPs are configured not to use links from the same admin groups.
Many vendors allow the creation of up to seven secondary LSPs for each primary LSP. For more information on configuring MPLS TE Path Protection on a Cisco NX-OS, check this link: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/5_x/nx-os/mpls/configuration/guide/mpls_cg/mp_te_path_prot.pdf
Having a secondary LSP pre-established to act as the backup path is much faster than having the headend router to dynamically compute new LSPs when the need arises. But we also have a faster mechanism for MPLS-TE protection. This mechanism, known as MPLS FastReroute (FRR), protects MPLS TE tunnels from link and node failures and is also referred to as local protection. We will cover the fundamentals of MPLS FRR in part 2 of this blog.