HTTP Made Simple, Part 2: Method Safety, And Idempotence
In part 1, we said that HTTP views the world as a distributed key-value store.You can make a pretty strong argument that the HTTP model is actually a distributed hash table. The protocol’s designers likely didn’t think in those terms (the first documented use of the term was in 1986 in reference to Linda, so it’s possible those ideas influenced HTTP somehow), but the simliarities are profound. DHT schemes usually rely on a two tier system for resolving keys. A key is hashed to a secondary server, which can actually resolve the reference. This precisely mirrors what HTTP does with URLs, which include a host component. Effectively, HTTP leverages TCP/IP and DNS to resolve a reference to a specific server. This partly explains why it scales so well. It’s an extremely clever idea that we tend to take entirely for granted today. HTTP lacks recovery capabilities associated with DHTs, but, even there, redirects arguably perform a similar function. The URLs are the keys and the values are resources. Resources, in turn, are actually dictionaries of different representations, or formats, for the resource. For example, a video resource might have different encodings, each of which is a representation that can be accessed if you know its media type. In Part 2, we’re going to begin to explore this idea in more detail, starting with the verb we’ve conspicuously ignored up until now: POST
.
Which One Of These Methods Does Not Belong?
The core operations for a key-value store are get, put, and delete. As you’d expect, each of these correspond to well-defined HTTP verbs. And by well-defined, I mean that they’re more than just window-dressing to indicate intent. For example, a client might cache the response to a GET
request exactly because it’s defined to allow that.
But HTTP includes a fourth verb, POST
, which provides for cases where strict key-value store semantics don’t suffice. Rather than take the pedantic tack of insisting that everything fit into a single abstraction, HTTP gives you POST
as a fallback.Unfortunately, for historical reasons, this led developers to misunderstand and overuse POST
, which, in turn, contributed heavily to the confusion that surrounds HTTP to this day.
The Supporting Cast
So that accounts for the existence of POST
. What about PATCH
, HEAD
, OPTIONS
, and so forth? It’s easy when looking at all these methods to lose sight of the underlying abstraction that HTTP provides. It’s important to understand that these other methods exist largely in support of GET
, PUT
, and DELETE
.
PATCH
: Small Alterations
Let’s start with PATCH
, which is a variation on PUT
. For large resources, we might not want to update the entire resource. We can specify byte ranges using HTTP headers, but sometimes even this isn’t enough. Sometimes we want to provide a logical description of an update, such as update the name and date-of-birth, which doesn’t strictly correspond to a byte range. For these cases, we can use PATCH
.
HEAD
: Tell Me About Yourself
The HEAD
method is an analogous variation on GET
. We might not want to GET
an entire resource, we might just be interested in information about the resource. For example, does it even exist? When was it last modified? The HEAD
method works just like GET
, except it doesn’t actually return the resource, just the headers, or metadata, about the resource.
OPTIONS
: Reflecting On What We Can Do
The OPTIONS
method provides for limited reflection on an HTTP server. Recall that URLs, which are the keys in our key-value store, have a host component which tells us which server can resolve the URL. We can also simply ask a given host to tell us about a resource, or even about the server itself. In practice, OPTIONS
isn’t used much, except with CORS.
Choose Your Own Method
The protocol is extensible, so it’s possible to define other methods. For example, WebDAV adds COPY
and MOVE
methods, allowing you to copy or move a resource from one key (URL) to another. However, most of the time, you’re better off just sticking with the core methods because (a) their behavior is well-defined (see below) and (b) there’s lot of software out there that takes advantage of this behavior.
In the end, though, the real stars of the show are GET
, PUT
, and DELETE
. The POST
method is the workhorse that takes over where the key-value abstraction leaves off, making it possible to request that a server take an arbitrary action.
Safety and Idempotence: Not a Sex-Ed Class
As we discussed earlier, GET
, PUT
, and DELETE
are well-defined, which makes it possible to reason about them. We know that GET
isn’t going to delete anything. We know that DELETE
will. Thus, we know that it’s safe to call GET
, but not DELETE
. It’s also not safe to call PUT
, because the server will, if possible, replace the value of the given resource with whatever we send it.
Safe
However, we don’t know much at all about POST
, because its behavior isn’t well-defined. The server might do any number of things, including creating, updating or even deleting resources. It might debit a bank account or call you a taxi. None of which have to do with our key-value store. It’s basically a remote procedure call. As such, HTTP makes no guarantees about POST
. Thus, we say POST
, unlike GET
, isn’t safe.
Idempotent
If we can call a method repeatedly and not worry too much about it, that means it’s idempotent. This is good to know because it affects how you might use them. For example, you can write retry code fearlessly with idempotent methods. Even if it turns out your original request went through, redundant requests don’t really hurt anything.
Obviously, safe methods are all idempotent. It doesn’t matter how many times you call them because they’re safe. Equally obviously, DELETE
is also idempotent, because you can only delete something once, and after that, it’s overkill. But, again, POST
isn’t idempotent, because we don’t actually know for sure what it’s doing. You might double the charges to someone’s bank account or call two taxis. So, when using POST
, you have to be careful. This, again, is why it’s useful to prefer GET
, PUT
, and DELETE
when you can.
POST
as Create
For a long time, Web developers were obsessed with mapping HTTP methods to CRUD database operations. Obviously, since HTTP sees the Internet as a giant key-value store, this was a doomed effort: there’s no create, the C in CRUD. That’s by design, not accident, because, again, the underlying model is closer to a hash-table than a relational database.Now, there’s an entirely different question about why a key-value store is the right model. Or, put another way, why doesn’t HTTP have a first-class create
method in the first place? The short answer is that it’s redundant since you can already implicitly create a new resource with PUT
(which can return a 201 Created
).
The confusion was partly due to an idiomatic use of POST
to create a new resource. This is completely valid thing to do, but it’s important to understand that POST
doesn’t actually mean create. Since HTTP doesn’t define its behavior, it can be used to create new resources, and often is. But you could also, if you knew the key (URL), simply PUT
a value to it.
Until Next Time…
This is a nice segue into our next topic, URLs. These are the keys in our global key-value store and, like much of the rest of HTTP, they’re surprisingly misunderstood.