We Don’t Have the Luxury of End-to-End Complacency

As a service provider, you want to deliver reliable, fast performing networks. Right? Well yes, but really as a means to deliver reliable, fast performing applications. Or, more accurately, a reliable, fast performing experience with those applications.

As Light Reading editor-in-chief Ray Le Maistre points out in his article, “The Five Nines Makeover,” the traditional ways of monitoring network health by discrete elements – CPU/memory utilization, device pooling, tracking jitter and packet loss – are no longer sufficient. The end users just want to use the application, and they have specific expectations of performance. This means that it’s imperative to look at the end-to-end performance of applications.

With the arrival of software defined networking, we’re going to see more of an emphasis on this fact, as programmable networking capabilities and SDN cause us to rethink how end-to-end performance is managed.

In programmable networks, network events occur many times a second, and traffic behavior is less predictable. In fact, programmability takes the operator out of the equation. Traditional manual and device-centric management methods do not provide the visibility needed to run a programmable network that automatically adapts to application demands. As a result, we no longer have the luxury of being complacent about end-to-end performance.

If applications and services are being rolled out without operator intervention and adequate visibility, how do you plan for them? Who or what governs whether or not these programmatic changes should be made? How do engineers know if the network can support a new request without adversely impacting existing applications?

One big challenge to answering these questions is that the traditional methods only periodically collect data from the network and create a performance picture from a series of snapshots. What’s needed is to move today’s management practices into the automation realm. We need to understand what impact an application requesting resources from an SDN will have on performance. SDN creates the need to replicate traditional functions of capacity planning, monitoring, troubleshooting, security, and other critical management capabilities.

For SDNs, much higher fidelity is needed requiring telemetrics to be “pushed” rather than “pulled” from the network, as Bikash Koley, Principal Architect and Manager of Google’s Network Architecture team, explained at the recent Big Telecom Event in Chicago.

Real-time SDN analytics are critical to enabling engineers to make good decisions. They are also vital to allowing the network software itself to make good “decisions.” If a link performs poorly, an SDN network can route around it – if it knows that the link is indeed performing poorly and what the next best route is. But if the information is incorrect or misleading, the computer will blithely go through its programming, making the “right” decisions for the wrong scenario. Truth be told, a human being could also make the same mistake, given the same data, but computers have the ability to make billions of mistakes per second.

We need to rethink the concepts around how we characterize performance because of this. We can’t just aim for “five nines” on specific parts of the network – we have to look at the entire picture, end to end.