In part one of this series, I explained the challenges with traditional data center architectures. Here’s how to overcome these challenges, starting with an oldie but goodie that has been resurrected.
The whole story begins with the “new” architecture called CLOS. New in this context means new in data centers. Telephony network engineer Charles Clos developed this architecture in the 1950s to meet similar scalability requirements. It’s being used today to improve performance and resiliency. Let us take a closer look at CLOS Architecture in the data center.
A CLOS topology (Figure 1) is comprised of spine and leaf layers. Servers are connected to leaf switches (Top of Rack – TOR) and each leaf is connected to all spines. There is no direct leaf-to-leaf and spine-to-spine connection. Here are the architectural advantages of this topology:
How could this be cost effective compared to the traditional design? The key point is that spine switches do not have to be big and expensive as opposed to the core switches in the traditional design. But if there are so many TORs/leaves that users hit some kind of port limitation, then users should select switches with a high number of ports (up to 2300-10G) for the spine layer. The best way in this case, though, would be to build 5-stage, 7-stage or a multiple pod architecture.
It would be a huge mistake to design a data center, based on the CLOS architecture, and use only L2 switching instead of L3 routing. When compared to L2 switching, L3 routing has lots of benefits, not only for scalability and resiliency, but also for visibility, which is quite important to the planning and operation teams.
But how should L3 routing be integrated into the design? Partially or fully? Which protocol should be used? Is it possible to get rid of L2 switching? First, here’s a design that separates TOR switches from the leaf layer and makes the spine to leaf connections L3 (Figure 2).
In this design, TOR instances are connected to leaf instances with L2 links and spine-leaf connections are L3. MLAG enables operators to utilize all the links and create bandwidth aggregation to leaf switches.
Is it possible to eliminate the use of STP and MLAG? In general, the size of L2 domains should be limited due to the difficulty in troubleshooting L2 environments. There is also the risk of broadcast storms in large L2 domains, and MLAG is vendor dependent. This introduces limitations such as being able to use only two leaves in one instance.
What about expanding L3 routing all the way down to the TORs? Here’s how the topology looks in that case (Figure 3).
In the third and final part of this blog series, I will compare and contrast BGP and IGP to help determine which protocol to use in the CLOS architecture.
The BGP in the Data Center blog series written by Onsel Kuluk, Systems Engineer at Packet Design is now available as an E-Book. Download your free copy here: