Thursday, March 27, 2014

Practical ReST Architecture - Part 1 - Dude where's my resource?

This is the first in what may become a series of posts on the tips and tricks I've learned while implementing a pretty strict approach to ReST architecture. Most of it will be drawn from the accumulated experience of building a Hypermedia API to power KIXEYE.com.

Hello


I'm a software architect. It's a pretty vague title I'll admit. It means, in essence, I design software systems at a macro level. I also code, a lot, because I believe good design requires one to feel the pain points of implementation in order to do better work.

Now on to the good stuff. ReST is a discipline. It is not, as most have come to believe, simply pretty URLs. I won't bang on about Roy Fielding's dissertation or the Richardson maturity model but if those things don't mean anything to you, and you think you are implementing ReST, then you probably aren't. So let's fix that.
ReST when explained by the experts can get cripplingly boring so I'll try to keep it to the stuff that matters because ReST is actually pretty badass if you grok the power in it's simplicity.

Dude where's my resource?


Firstly you need a resource. A resource in my world is a URI (Universal Resource Indicator) (implemented in an http API as a URL).
So lets say we want to retrieve the data for user 123.


Let's get it wrong a few times to illustrate a point about discipline.

1. /user?id=123
2. /users?id=123
3. /user/123
4. /getuser/123
5. /getuser?id=123


Nope, nope, nope, nope and nope.

The following is correct:

/users/123

But why? What does it matter if I call my resource /user/123 or /users?id=123 rather than /users/123 ?

Here is why: Because all individual resource URI's should follow a simple pattern.

/<set>?<query params> (for searching for results within a set)
/<set>/<unique identifier> (for an individual resource)

A set is a logical bag or list of resources, in this case all the /users. The unique identifier of the user we want is 123.

(I'm simplifying a little here, we will get into composite keys and nesting resources later, so bear with me)

Now here is why that rule exists:

POST /user
GET /user?gender=M

Those URIs seem a little off because the lack of pluralization makes it hard to tell you are working with the set of all users, not an individual user. They should read:

POST /users (add a new user to the set of all users)
GET /users?gender=M (return an array of all users with gender M)

Now this might seem like a pedantic point, (to pluralize or not to pluralize) but trust me, when you have hundreds of URIs this convention will keep you and your team sane.

Verbal

The other bad examples I gave include a cardinal sin of ReST URI construction: A verb in the resource name.

/getuser/123

other bad examples might be:

/deleteuser/123
/update/user/123
/adduser


URIs should always describe nouns. Either a logical set of resources (/users) or an individual resource within a set (/users/123). All actions on those nouns (verbs) come in the form of request methods (GET, POST, PUT, PATCH, DELETE) and are separate from the resource identifier for good reason. You want multiple verbs to act on a single noun, because a noun is a real thing, and you are doing something to that thing.

A series of tubes


And here we get to the reason why we need to be disciplined about our URIs and the crux of why one would want to have a ReST architecture at all. ReST resource representations are edge efficient. They have unique identifiers that allow us to cache them very efficiently and flush those caches at the appropriate times.
This suits the distributed nature of the internet and most http APIs very well because most are read heavy animals and caching can add a huge amount of optimization.

Here is a real world example:

1. GET /users/123 returns the user resource with unique id 123

this can be cached in memcached, varnish, name-your-edge-cache, CDN, etc...

2. GET /users/123 can be retrieved a million times without ever bothering more than the edge caches.


3. PUT /users/123 is called (the user data for user 123 is replaced)

Because we know the PUT verb means replacement, and we know from the URI that only a single resource is being replaced, and which resource it is, we can re-populate the edge caches with the new value after updating the data store.

We also know from the URI that /users/123 is an individual resource within the logical set of all /users so we can make calls to our edge caches to flush any caching of /users.
Response caches like Varnish can be used to cache query calls to logical sets like:

GET /users?gender=M

Because the number of requests for data usually far outnumbers the number of additions, updates and deletions our request cache can happily serve up most of the recurring requests, being flushed and populating only when an infrequent change is made to the set of all users. (by infrequent I mean we see thousands of requests per second and it could be many seconds or minutes between user updates)

Patterns

The cherry on the top is that because we are adhering to a strict pattern we can write generic code to do all of this without having to concern ourselves to any great extent with the actual resources. Just by virtue of the incoming request URI pattern we know some things need to happen, like flushing edge caches of resource sets.

This is one of the foundation stones of a ReST architecture. By following a strict discipline you allow optimizations both in the operational efficiency of the system and also the code itself. The ability to build upon valid assumptions accelerates development and eases maintenance.


In future posts I'll talk more about composite keys and resource nesting, content-types, json-schema, hypermedia and linking, graph expansion, batch updates, when to use query params, authorization, mediation and how that effects caching, eventing with ReST, and some tricks for working around some of the problems that seem an awkward fit for a ReST architecture.


:-)
-Paul Hill

No comments:

Post a Comment