Will SIDR Succeed Where the IRR Model Failed? (Part 2)

This article is the second of a three-part series where I compare and contrast the IRR and SIDR security models as well as discuss how we can get closer as an industry to securing the Internet’s routing system. It was first published on the APNIC blog. APNIC (Asia Pacific Network Information Centre) is an open, membership-based, not-for-profit organization. It is one of five Regional Internet Registries (RIRs) charged with ensuring the fair distribution and responsible management of IP addresses and related resources.

In part 1 of this series, I gave an overview of the IRR and the SIDR models for securing the Internet’s routing. In this part, I share our insight on why the IRR model’s deployment has been low, in the hope that it helps increase adoption of the SIDR model. As I mentioned in Part I, some of these challenges cannot be addressed using technology alone and need economic and social engineering as well.

Tragedy of the Commons

First, let’s look at the economics of deploying one of these solutions. Early deployments incur most of the cost, and benefits are not achieved until a critical mass is reached. Using the Pakistan hijacks YouTube example, YouTube may register its RPLS route objects or SIDR ROA objects, but necessary filters may not be in place at the upstream provider of the Pakistani service providers, and hence YouTube routes may still be hijacked.

Unfortunately, both the IRR and the SIDR models suffer from this phenomenon. Geoff Huston, in his APRICOT presentation, identified this as the “tragedy of the commons.” That is, if every service provider optimizes the outcome for itself, we cannot reach the globally optimum outcome. Typical solutions to tragedy of the commons involve “regulation.” Since regulation is often undesired, what can we do to reach critical mass without it?

Stale Data

In IRR model deployments, we wanted an evolutionary path, not a revolutionary path to reach critical mass. To this front, we wanted to take advantage of RPSL’s heritage. RPSL is based on earlier work known as RIPE-81. RIPE-81 had route objects; however, it lacked a security model and expressive policy representation. RIPE, as one of the early IRR, had many of these objects already registered. In the U.S., the Routing Arbiter team (a collaboration between University of Michigan’s Merit Networks and University of Southern California’s Information Sciences Institute), which I was a part of, converted similar policy objects found in the NSFNet backbone network’s policy database into route objects and stored them in a new IRR called RADB.

In the meantime, the Internet was going through a big commercialization transformation; from the NSFNet being a single Internet backbone network, we switched to multiple commercial backbones and regional networks. As a result, many network operators changed their upstream service providers. Since these new service providers did not use or require registration of these objects, the objects in the IRR became out-of-date very quickly.

Ultimately, the data became stale because it was not used operationally. And conversely, it was kept up-to-date where it was used operationally. I was disappointed to learn that the SIDR model is already suffering from the stale data phenomenon. In his APRICOT talk, Faruk Alam reported that more than 50% of the new ROA objects registered in the APNIC database are already invalid. Some regions are doing better than others, but invalid data is present in all regions (see http://rpki.surfnet.nl/perrir.html). We have to find a way to reverse this trend.

I think the only way out of this is the operational use of the data. If we turn on BGPSec today, we would be breaking the reachability of the invalid prefixes. I am not advocating breaking anybody’s reachability, especially not of the early adopters. This can be avoided with sufficient monitoring and warning of these invalid announcements before turning on such a switch.

If we don’t turn the switch on, the amount of stale data will increase. If we are ever going to turn it on, it is best to do it while the amount of stale data is small.

Weak Security Model

RIPE-81 used two weak authentication methods: mail-from and unix-crypt. Operators could register objects by sending an email to a well-known registry mailbox. Mail-from, which is now deprecated, simply checked the sender’s email address against an allowed list of email addresses, and unix-crypt required sending the user’s password in the clear in the body of the email.

We have solidified the security model with a public and private key pair method as well. However, we have not deprecated the old methods; we simply discouraged it and provided a transition path to the more secure method. After all, if an operator did not care to protect himself, it is his prerogative. Mail-from and unix-crypt were still useful against accidental misconfigurations. There was a social aspect of this choice that we did not anticipate: it gave the IRR model a bad security reputation and was used as an excuse against updating the stale data.

The SIDR effort definitely sides on the security side of this balance. However, as a result, it needs a database that starts from scratch. Faruk also reported that the ROA adoption had been less than 1% in most regions. LACNIC is an exception to this with an almost 25% adoption rate with less than 4% invalid data.

The IRR model uses Pretty Good Privacy based cryptographic signatures, which are based on a web of trust among service providers (where public key of a service provider is signed by other service providers). The SIDR model uses X.509 based certificates, which are hierarchically assigned. In the SIDR model, it is possible to shut down a misbehaving service provider by revoking its certificates. However, some service providers worry that this feature might be abused. Use of certificates makes registries a relying party, which is an uncomfortable change to some registries.

Out-of-band Verification and Need for Publishing Policies

The IRR model uses out-of-band verification. That is, it relies on IRR containing route and aut-num objects with accurate policies. This data is then analyzed and compiled into router configurations. All of this happens before any BGP message is received. When BGP messages are received, appropriate filters are in place to accept only valid announcements. That is, the announcements that would cause prefix hijacking or man-in-the-middle attacks can be filtered out. However, the system requires registering accurate policies such as who the peers of each AS are and what routes are being exported and imported from them. Some service providers have privacy concerns for revealing this information. In reality, most of this information is already in BGP routing tables.

The SIDR model on the other hand uses hybrid out-of-band and in-band verification. For ROA objects, it can use either in-band or out-of-band validation. For verifying what BGP AS paths are valid, it uses in-band validation using BGPSec. This replaces the need for registering policies with new validation machinery that is now part of exchanging BGP routes. This is a great benefit. However, it has a serious drawback. This in-band machinery needs to be updated each time a new kind of attack is discovered.

For example, when man-in-the-middle attacks surfaced, operators realized that BGPSec did not protect against them while the IRR model did. BGPSec is now being further extended to protect against some classes of man-in-the-middle attacks. We are looking at a standardization-implementation-deployment cycle of roughly two or more years. We will pay this penalty each time we face an attack we have not dealt with before.

In the last part of this series, I will discuss what we should do to secure Internet routing now.