(Re-)Introducing Web Capabilities

You can think of capability keys as keys on a keychain.

Back in 2012, I wrote a blog post introducing the idea of Web capabilities. Once in awhile, someone asks me what happened to it. The answer: I don’t know. I can’t even find it on archive.org.

Since then, the rise and fall of Bitcoin introduced the capability security model to the world, in the form of the private keys used to sign Bitcoin transactions. Those private keys amount to a capability key. This perhaps has made Web capabilities more palatable to developers. If you’re willing to consider using capabilities for financial transactions, perhaps you’ll consider them for Web apps, too.

Nonetheless, Web capability security— authorizing HTTP requests using capability keys —is an unfamiliar idea to most developers. I’m going to try again to describe the idea here. If you were looking for that original post, this is the best I can do. For everyone else…let’s learn about Web capabilities!

What’s a Capability?

In computer security parlance, a capability is a thing you can do if you have a key. A great analogy is your keychain in real life. You have a bunch of keys on a keychain. You have one for your car. You have one for your house. Perhaps another for the office. Each key represents your authorization to drive your car or enter your home. You can make copies of the keys to share, effectively granting that authorization to other people, like a significant other or a wayward sibling. You even grant that authorization temporarily, like you might do with the mechanic fixing your car or a housekeeper. Sharing a key, however, only provides access to one thing. Sharing your car key doesn’t give your mechanic access to your office. Nor does sharing your house key give your housekeeper access to your car.

A canonical computer science example of a capability is a Unix file descriptor. In this case, the file descriptor is the key. A file descriptor allows you to do things to a specific file. You can read or write that specific file with it, but not another file. And once you have the file descriptor, the operating system performs no further authorization. A file descriptor represents the authorization to read or write a given file, just like a your house key gives someone authorization to enter your home. What’s particularly interesting about this example is that you can’t even try to use a file descriptor use with another file because it doubles as a pointer to the file. It’s a bit like discovering that your car key doesn’t even fit into the lock on your front door, let alone open it.

Capability Keys Are Unguessable

If a capability is a thing you can do if you have the key, a Web capability is a thing you can do on the Web if you have the key. More precisely, it’s an HTTP request you can make if you have the key. Of course, these keys aren’t like car keys. Instead, they’re strings encoding sixteen or more random bytes. Why sixteen? Because 16 x 8 is 128, so we have 128-bits, which means there are 2128, roughly 3.4 x 1038 possible keys. On average, it would take 1.7 x 1038 attempts to guess a given key. If I could make a billion guesses per second (which I can’t, given today’s computers, but we’re being paranoid here), that would take 1.7 x 1029 seconds, or 5.4 x 1022 years. The universe would extinguish (or whatever it’s going do) long before I could guess your key. One day, when we’ve learned to harness the universe itself as a massively parallel quantum computer (and before we accidentally trigger an infinite loop), we’ll have to switch to larger keys. But for now, we can use 16-byte keys.

Capabilities Are Fine Grained

So: a Web capability is an HTTP request you can make with a random 16-byte key. The key point here is that it’s a thing you can do, not a bunch of things. That is, it’s not like a typical bearer token in a scheme like OAuth2. (I’d warn you about OAuth2 but that’s another topic for another day.) In a typical bearer token scheme, you can do a bunch of things with one token, which isn’t nearly as useful. First, you can’t share bearer tokens. If someone has your key, they can do anything you can do, which isn’t usually what you want. Similarly, if someone steals that key, they now have All the Privileges. Not only that, but you typically need another authorization scheme on top of bearer tokens to check whether the presumed owner has permission to do something. And the more complicated a security scheme is, the more likely it is to have an exploit.

Meanwhile, Web capabilities are fine-grained. You can do one thing and one thing only. That means you can potentially share a capability with a friend. It’s like giving your mechanic your car key. The risk associated with the compromise of any single key is smaller. And I can use capabilities as my one and only authorization scheme, which keeps things simple. Again, when it comes to security, simplicity is usually your ally. (Which is not to say that security is simple. Simplicity in this context is about the ability to reason about something. It’s also about minimizing the attack surface.)

Example: Blogging Capabilities

This is all a bit abstract. Let’s consider an example. Suppose I have a blogging application that uses Web capabilities. We’ll start by creating a blog post, supplying a capability key in the authorization header. The keys are Base64URL encoded, with the padding stripped. If you’re using Node, you can use Fairmont library to generate them.

POST /blog HTTP/1.1
host: acme.com
authorization: Capability 0rtlTNGkMIXb_o2JMqowmQ
content-type: application/x-yaml

title: My Post
key: my-post
body: |
  It was a dark and stormy night.

On the server, we associate that key with making a POST request to the /blog resource. If it’s valid, the server will process my request. I’ll get back a 201 Created with a location header, which is just standard HTTP. But I’ll also get a list of capability keys associated with the newly created resource.

HTTP/1.1 201 Created
location: https://acme.com/blog/my-post
capabilities: PUT=cr_GtJ-goQtxnXs0Y5T6QA DELETE=69o3jUzsC1kI4vJi-jl0oA

The server granted me keys for PUT and DELETE requests for my newly created blog post resource. That is, I get keys associated with the HTTP requests for editing a post and for deleting it. (Another way to implement a Web capability scheme is through the use of capability URLs. These are URLs that embed the key in the URL. These are useful for GET requests. In particular, there useful when you want to make it easy for people to share a capability. All they have to do is share the URL. This is what DropBox does with their file-sharing URLs. GitHub does the same thing with private gists. If you have the URL, you can GET the associated resource.)

When I make a PUT request to update a blog post, I need to provide the associated key as part of the request, otherwise I’ll get an error back from the server. (The precise nature of the error depends on the audience for your API. If it’s public facing, you can simply return a 404 to avoid leaking information to potential attackers.)

If I want to let you edit my blog post, I can simply share the update key with you. Now you can update that post, but you can’t delete it. You can’t update other posts for which I haven’t shared the key. You can’t add new blog posts. All you can do with that key is update that specific blog post.

The Keychain

We would store these keys in your keychain, which would be a dictionary of keys, allowing you to look up a key based on the URL and method.

"https://acme.com/blog":
  post: "0rtlTNGkMIXb_o2JMqowmQ"
"https://acme.com/blog/my-post":
  put: "cr_GtJ-goQtxnXs0Y5T6QA"
  delete: "69o3jUzsC1kI4vJi-jl0oA"

The problem of storing and securing your keychain is separate from implementing capability authorization on the server. In fact, the authentication (logging in so I can retrieve my keychain) and authorization (managing the keychain itself) could be in a different application. We could even share a single keychain across any number of applications. (This is kind of how Mac OS X works today. The operating system manages your keychains and applications all share it. Any given application can attempt to access any keychain, in which case the operating system asks you if that’s okay. This is more coarse-grained than we’d like, but it illustrates the basic idea of separate authentication from authorization.)

This reduces the problem of securing your applications to securing the keychain application. Put another way, instead of having to authenticate to N endpoints, one for each application, we need only authenticate with one. We get single sign-on almost incidentally.

Authorizing a Request

On the server, all we need to implement capability authorization is a dictionary that allows us to look up a key.

"cr_GtJ-goQtxnXs0Y5T6QA":
  url: https://acme.com/blog/my-post
  method: put
"69o3jUzsC1kI4vJi-jl0oA":
  url: https://acme.com/blog/my-post
  method: delete

The server doesn’t know anything about keychains or, more generally, who has which keys. It just knows whether a given key authorizes you to make a given request. There’s no concept here of identity or roles or access control lists. Instead, we just have a simple lookup. If you have the key, you can do the thing.

In practice, we’d encrypt the entries so that, even if an attacker compromised your capability storage, they’d have no way to know which keys corresponded to which requests.

Capabilities Spanning Resources

What about capabilities that work across more than one resource? For example, suppose we want to grant permission to a moderator to delete offensive blog posts. We could create a capability for that using a URI template.

url: https://acme.com/blog/{key}
method: delete

This capability will match the URL for any post created in our application. We could grant this key to any number of moderators.

One-time Use and Expiring Capabilities

We can even do one-time use or expiring keys by simply including the constraints for using the key in the description.

url: https://acme.com/blog/{key}
method: delete
expires: March 15, 2015

If we get a request that matches an expired capability, we reject the request and delete the capability. With a single-use capability, it’s even easier. We just delete the key after a request comes in that matches it. We could generalize this with an N-use key. For example, we could limit moderators to three deletions, just so they don’t get too carried away. After three deletions, they’d have to acquire a new capability.

url: https://acme.com/blog/{key}
method: delete
expires: March 15, 2015
uses: 3

Revocation

Sometimes we want to revoke a capability we issued. For example, I might issue a key for editing a blog post to a copy-editor. But then I switch copy-editors, so I want to revoke their key. Another example is when we suspect a key is compromised. Either way, the implementation is straightforward: we just delete the key from our dictionary.

Non-Canonical URLs

Applications may have multiple URLs that can refer to the same resource. This can sometimes be handle via URL templates. However, the URLs may be dissimilar. One simple way to handle this is to redirect to the canonical URL. Another would be to keep a mapping of URLs to canonical URLs, similar to what you’d have to support redirects, except that instead of doing the redirect, you simply translate the request URL into its canonical form before performing authorization.

Redirects

In some cases, URLs change. This has the potential to invalidate capabilities associated with those URLs. One simple way to handle this is to update your dictionary when a URL changes. Another would be to annotate the request headers on a redirect to keep the original URL.

Simplicity, FTW

Even after taking these considerations into account, our security code is still easy to reason about.

authorize = ({url, method, key}) ->
  capability = yield decrypt get_capability key
  capability? &&
    (match url, capability.url) &&
    (method == capability.method) &&
    !(expired capability) &&
    !(exhausted capability)

I’ve hidden some complexity here. We have to query a database for the capability. We have to decrypt it. Matching URL templates is difficult, possibly involving canonicalization. Date comparisons require a date library. The expired and exhausted checks potentially involve updating or deleting the capability.

Nevertheless, the core logic is right there. Six lines of code.

That’s pretty appealing, isn’t it? That simplicity means that the surface area for attackers is small. For example, role-based authorization frameworks often require code to check a person’s role before handling a request. If we forgot to include this code, or we determine their role incorrectly, that’s a potential exploit. In contrast, the logic for verifying a capability is both simple and in one place.

Signed Capabilities

Public key encryption provides an intriguing way to build on this implementation, by making use of digital signatures. In addition managing an encrypted dictionary of keys, the server signs the key before issuing it. We further keep the recipient’s public key in the capability description. In turn, whenever the requestor uses the capability, they sign it again using the same public key. The server can then verify the requestor’s signature and its own signature and know that the original recipient is the requestor.

This adds an additional layer of security, albeit with at the cost of some additional complexity. If the additional security was the only benefit, the additional complexity (and thus, the number of opportunities to make a mistake) would probably cancel it out. However, what’s really compelling about this approach is the ability to securely share keys by signing an assignment of the capability to another entity.

If Alice wants to share a capability with Bob, she signs the key she has and sends it to Bob. When Bob tries to use the capability, the server can ask Alice to verify the signature. This reduces the likelihood that it’s a stolen capability. If Eve tried to use the capability, she’d need both Bob and Alice’s private keys, and the ability to impersonate Alice, in addition to the original encrypted capability description. Bob, in turn, could assign to Carol the same way. In fact, this works for applications, not just people. Applications could assign capabilities to one another, allowing them to interoperate securely without resorting to course-grained bearer tokens.

Disclaimer

Unfortunately, outside of some prototypes, we haven’t had the opportunity to use Web capabilities. We don’t have the practical experience to recommend it as anything more than an interesting idea. We’d love to hear from anyone who has implemented Web-capabilities (or something along these lines) for a real-world application. We think it’s an attractive model and we’re hoping to move beyond experimenting in the near future. We’ll be sure to blog about it when we do.

A New Frontier…

Web capabilities build upon on a well-developed computer security model and offer fine-grained, transferable (shareable) authorization. Keys can refer to specific resource or a collection of them by taking advantage of URL templates. Limited use and expiring keys are possible as well. Key verification is simple to implement, compared to ACL or role-based schemes. We can share keychain management across applications, further reducing an attacker’s surface area. Capability-like models are seeing increased use, ranging from Gist’s capability URLs to Bitcoin. We think the time is right to explore Web capabilities further and develop implementations and best practices for using them in real world scenarios.