Ensuring network uptime requires insights into the performance of the devices in the network. Traditional methods of collecting network telemetry data include “pull” based mechanisms such as the SNMP protocol, CLI show commands, and syslog messages. All of these have inefficiencies that inhibit a network engineer’s ability to plan, deploy, and assure services.
We are now seeing the emergence of a different mechanism for collecting network performance data – streaming telemetry – a push-based mechanism that overcomes the weaknesses of SNMP polling and other data collection means. Here is a primer on what is streaming telemetry.
Here are the shortcomings of the traditional collection methods, which help to highlight why streaming telemetry is gaining steam.
Every network monitoring system primarily uses SNMP for collecting a wide variety of device performance metrics. The most common metrics are availability, throughput, utilization, delay, and errors; collected for devices, interfaces, links, memory, and CPU.
SNMP consists of the MIB (Management Information Base), where the management information is stored in a hierarchical structure with different branches. These are represented by an identifying number and string, together known as OID (object identifier). The MIBs are standardized, but vendor-specific definitions are also allowed.
When vendors add their own fields for a new metric, a third-party network management tool will need to be updated to read and process the new OID. This is easy to do in small- and medium-sized networks. However, doing the same in large networks – such as service providers with thousands of multi-vendor devices – is demanding.
Here are the other limitations with using a “pull” based technology such as SNMP.
The information these methods collect about the current state of the network and certain performance metrics is incomplete. Under a pull-based mechanism, the devices send data only when requested by a client, thus requiring manual intervention for selecting devices and metrics to poll, setting polling intervals, etc.
When using SNMP, for instance, changes that occur between a polling interval are discovered only after the next polling. In large networks, the polling interval can be as long as five minutes, which is unacceptable when hundreds or thousands of events can occur in the interim.
Additionally, SNMP does not collect all information regarding a device or its current performance. Some information is not stored in the MIB and is accessible only using CLI commands or is not available at all. For example, neither SNMP nor using CLI can collect historical routing information. This leaves a visibility gap into routing changes in the network unless the network operator owns a route analytics tool.
Scaling issues occur due to the load on a network management system (NMS) used in large networks. This can be quite high due to the sheer number of devices it has to poll and the volume of data it has to process. Additionally, the CPU and memory load on a network device parsing numerous SNMP requests can be excessive, especially when multiple NMSs are used.
Polling-based techniques also add to security issues, as the NMS needs the list of devices to poll as well as each device’s IP address and security credentials.
The image below summarizes the limitations of data collection via SNMP, syslog messages, and CLI. An NMS that uses these data sources should be able to provide meaningful information by formatting, processing, and correlating the mix of data in different formats, from various sources. Unfortunately, most don’t.
Streaming telemetry is a push-based mechanism that removes the inefficiencies associated with polling. The required data is streamed automatically and continuously from network devices to management systems, without the need for any polling. The network device is configured with the type of data to be collected, the frequency of collection, and where it should be sent.
Streaming telemetry enables access to real-time, model-driven, and analytics-ready data that can help with network automation, traffic optimization, and preventive troubleshooting. For example, streaming telemetry can report in real time on packet drops or high utilization on links. This information can then be used by a network automation platform to provision new paths and optimize the traffic across the network.
With streaming telemetry, the telemetry data is described using YANG, a structured data modelling language, encoded in JSON, XML or using Google Protocol Buffers and is then streamed over TCP, UDP or gRPC.
Cisco has covered the basics of two methods for streaming telemetry data here: https://www.cisco.com/c/en/us/td/docs/iosxr/ncs5500/telemetry/b-telemetry-cg-ncs5500-62x.html
For even more reasons on why streaming telemetry is a better choice than SNMP polling, read: http://blog.ipspace.net/2018/03/streaming-telemetry-view-from-trenches.html
Are you ready for streaming telemetry? Check out our blog for information on how-to configure streaming telemetry on Cisco IOS-XR: https://www.packetdesign.com/blog/enabling-streaming-telemetry-on-cisco-ios-xr/
The “Network Basics by Packet Design” blog series explains terminologies and technologies used by network operators and service providers, including routing, MPLS, Traffic Engineering, SDN, etc. Don’t forget to check out the other blogs in this series here.