In this three-part series, we’ll cover why traditional data center architectures don’t meet the needs of today’s service providers and how to overcome their limitations. This includes using a decades-old architecture that has been given new life, Layer 3 routing, and Border Gateway Protocol (BGP).
BGP is a well-known protocol that service providers and enterprises have used for decades to manage routing throughout the internet. Considering the rapid growth of the full Internet table (growing approximately 10 percent year-over-year since 2009 and 664K total), BGP is the best protocol to handle these routes.
That being said, service providers and content providers require more functionality to control inbound and outbound traffic flows, particularly at the edge of their networks. BGP is called a path vector routing protocol, which is a fancy way of saying it is a distance vector routing protocol with several additional attributes. These additional attributes allow users to gain more flexibility in order to manipulate routing decisions, which means they also have higher flexibility in controlling network traffic.
Over time, BGP started to be used in different network segments for different purposes. Do you remember the address families such as vpnv4 used for MPLS L3VPN and labelled unicast for seamless MPLS and 6PE? It seems whenever we are up against the wall with new protocols or architectural needs, we call BGP to the rescue. This is what happens in today’s data centers as well, as there are many challenges with traditional architectures.
In this architecture (Figure 1), the topology is composed of three layers that are connected to each other via L2 links. Thus, traffic flow is controlled mostly by L2 protocols. Here are the drawbacks of this architecture and why it does not fit the current data center requirements.
The classic data center architecture was developed on the assumption that most of the traffic flows in a north-south (user-server) direction. The obvious inference from this is that north-south traffic is supposed to be greater than east-west (server-server) traffic.
This architecture is still valid for some service provider data centers, but with new Content Delivery Networks (CDNs), almost 80 percent of total traffic is in an east-west direction. Content itself is becoming more critical and valuable day by day. Even service providers are providing more cloud services and are acquiring and serving more and more content.
For that reason, in my opinion, service provider data center requirements will evolve to the same ones as in CDNs. Server to server (e.g., App-Database, App-Web, VM migration, Data Replication) communication has been increasing significantly.
When server A wants to reach server D, inter-vlan traffic takes the path to one of the core switches and goes back to server D by passing over all layers. However intra-vlan traffic can be handled by the distribution layer. This means the number of hops and the latency will vary based on the type of communication. In data center networks, the consistency of these two related parameters has become more critical than before, and the classic tree-based architecture does not provide this consistency.
When the data center grows, this architecture may not be able to scale due to port/card/device/bandwidth limitations. Adding new devices to the distribution layer will result in adding new devices to the core at some point, because the core layer has to be adjusted based on the lower layers’ increased bandwidth requirements. This means the data center has to scale vertically, since it was developed based on north-south traffic considerations.
STP is designed to prevent loops being created when there are redundant paths in the network. You are probably familiar with related terms, like Portfast, BPDU Guard, Root Guard, Loop Guard, UDLD, TCN, etc. I have never felt confident in STP and think there should be an easier way to overcome the challenges that STP “deals with” than continually adding more and more enhancements to it.
Fortunately, vendors recognized STP’s limitations and came up with alternatives, such as VPC, QFabric, Fabric Path, MLag, and TRILL. By using a combination of these protocols instead of STP, users can employ most of their L2 links and create a loop-free structure. For example, it is possible to bond switches/links and let them act as one switch/link. “L2 routing” can be enabled on the network as well.
However, even when using these technologies, scalability can still be an issue. It is not possible to bond more than two switches (this is supported only with various limitations). Vendor dependency is another disadvantage, as most of these protocols are vendor proprietary.
In part two, we’ll cover how operators can deal with these challenges.