Tuesday, April 22, 2014

Practical ReST Architecture - Part 2 - Cheat Sheet

This is my attempt to distill the majority of my ReST best practices down to a cheat sheet. I'll be expanding on each in later posts.

1. Resource URIs should be structured like so:

/{set}?param=value&param=value
/{set}/{unique id}

2. Nested resources should also follow this pattern:

/{set}/{unique id}/{set}?param=value&param=value
/{set}/{unique id}/{set}/{unique id}

e.g. /users/123/avatars/thumbnail

Don't go nuts with nested resources. They are only really required in very specific circumstances. Composite keys such as the thumbnail avatar example above is one case where they can be useful for uniquely identifying a specific resource. But where you have a surrogate or single valued natural key you should avoid nesting.

3. Use your verbs correctly.
  • GET does not alter server state. Ever. It is cacheable.
  • POST can have any number of side effects and can blow caches out anywhere.
  • PUT is idempotent which means if you send the same request 1 time or 3 times in succession the result on the server will be the same. Use it to completely replace an individual resource or an entire set. It invalidates only caches containing those resources altered by this request.
  • PATCH is also idempotent. Use it to make partial updates to individual resources, or to do upserts on sets. It invalidates only caches containing those resources altered by this request.
  • DELETE removes a resource. Technically the response for a DELETE on a non-existent resource should be 404 Not Found, but  that requires extra client error handling, and because it's pretty harmless we have returned 200 OK in this circumstance for most cases. 
4. Use the Content-Type header to determine the incoming data format and handle it appropriately. Handling all types of data format just for giggles can bloat your codebase unnecessarily. What is important is that you respond with an informative response code (415 Unsupported Media Type) when the Content-Type is not one you can handle.

5. Use the Accept header to determine the format of the response. Referred to as content negotiation. Again, 415 Unsupported Media Type response when you don't support that format.

6. Use the standard http response codes. They are there for a reason. And they work like a charm. Just enough wiggle room to fit every use case I've found. Document every response code for every endpoint and have a test case for each.

7. Don't be afraid to use the response codes as a template for error types in your code (e.g. BadRequestError, NotFoundError, etc...), they cover most error situations and it helps to ensure that errors can be caught and propagated to the user with complete context.

8. Validate your inputs. We use JSON schema v4 for this. It's surprisingly flexible and powerful. Make sure your 400 error responses are in compliance with JSON schema v4 also so they can be consumed and actioned automatically or by a developer who needs a clue as to why their request resulted in a 400 response.

9. Don't validate output, it's a fools errand, if you want to check your data for corruption and repair it run a script. Don't respond to a GET request with a 500 error.

10. To avoid a chatty API an expansion query param can be used to denote the nodes of the resource graph to expand. The format for this is not standardized so you can make up your own. Here is how we do it:

/users?expand=avatars

This will return an array of users each with its related avatar resources as an array called avatars.
You can also get cute with paths down the graph using semicolon delimiters and multiple expansion using commas.

/users?expand=avatars,friends:avatars

This will return an array of users including avatars, friends, and the those friends avatars. All this in one response.
What is nice about this approach is that it isn't custom for each endpoint. It's a simple pattern that can be repeated and generic handlers can be written to make that easy.

A note on cache invalidation: Using expand gives us a different URL for caching purposes, which is ideal as long as you cache match on the URL including the query params. You do need to remember to match the first part of the URI when doing cache invalidation to invalidate all the expanded versions of the resource too.

11. There are different formats out there for returning links to other related resources. The approach we followed was to include links in the schema using JSON-schema v4 format. The schema can be retrieved by calling GET on a URI with an Accept header of application/schema+json. This is where we reach the Richardson maturity model nirvana and full Hypermedia status. But in reality I've found it of limited use beyond a verifiable form of documentation. None of our clients are traversing the graph via these links that we are aware of. They tend to just expand the graph to keep the number of calls to a minimum. That said, even if it isn't used in anger all that often it does come in very handy during the development phase and helps to clarify to developers how the resources are related.

12. Understanding query parameters is important. There are no standardized rules for how they should be used but here are the rules we try to stick with for consistency:
  • Query params are used to filter lists/sets. e.g. GET /users?age=35
  • The filter is reductive in nature when multiple are applied. e.g. GET /users?age=35&status=active (must be 35 and active)
  • To do additive (google style) searching consider using a single query param named 'search' which includes a URL encoded search string (like you would type in google search). This string can be interpreted by your search system and has the advantage that you can embed quite complex searches that would be very unwieldy as simple query params. e.g. GET /users?search=age%3D35%20or%20status%3Dactive (age is 35 OR status is active)
  • Query params can be used to modify the data returned from a single resource representation too, just as we did with the expand query param. 
13. Some forms of batch update can be performed using the PATCH method. We take the approach that a PATCH is a shallow merge not a deep merge. Which is to say that the attributes of the incoming data override those of the existing resource. This applies to lists also where you have a unique key. So if we want to add / update a list with the contents of another list (using the unique id to match elements) we make a PATCH on a list. Because a JSON object is an associative array, and an array of objects with a known unique key are more or less the same thing just represented differently we can apply the same kind of PATCH logic to update either.
For example if we want to upsert 3 friends to user 123's friend list we can do so in one call like so:

PATCH /users/123/friends
[{ id: 234 }, { id: 345 }, { id: 456 }]

14. Authorization can throw a real spanner in the works with ReST and caching. This is because resources need to be 'mediated' to only present data the user is authorized to see. This means the user's authorization now has to form part of the cache key. This can be handled without too much difficulty when caching the data if the authorization token is in the header but it does make cache invalidation tricky. You have to look at caching of authenticated resources on a case by case basis to determine how much benefit you will get from it vs the complexity. Cache invalidation is always important to get right but becomes even more so when authorization is involved as you run the risk of exposing sensitive information if you get it wrong.

15. Events and time series data can fit into a ReST paradigm by creating a POST to an imaginary set and returning a 202 Accepted response. Take our analytics reporting endpoint POST /tracking-events. In theory it's the set of all tracking events ever submitted. But there is no GET /tracking-events endpoint because we use it only for receiving the events and triggering various side-effects such as feeding the data into our analytics data stores. We use this little trick quite often when we need the client to initialize a process on the server. Turn the process into a side effect of addition to a somewhat made up resource set. It works, and keeps the API URIs from turning into verb soup.

That's all for now. I'll expand on these points in subsequent posts. Ping me if you have any questions or just want to berate me for not talking about hypermedia constraints.

:-D
Paul