A RESTful approach to data synchronization

18,391

This is a pretty common problem, and a RESTful approach can help you solve it. HTTP (the application protocol typically used to build RESTful services) supports a variety of techniques that can be used to keep API clients in sync with the data on the server side.

If the client receives a Last-Modified or E-Tag header in a HTTP response, it may use that information to make conditional GET calls in the future. This allows the server to quickly indicate with a 304 – Not Modified response that the client’s previously stored representation of the resource is still valid and accurate. This will allow the server (or even better, an intermediate proxy or cache server) to be as efficient as possible in how it responds to the client’s requests, potentially reducing costly round-trips to a back-end data store.

If a response contains a Last-Modified header and the client wishes to take advantage of the performance optimization available with it, they must include an If-Modified-Since directive in a subsequent GET call to the same URI, passing in the same timestamp value it received. This instructs the server to only GET the information from the authoritative back-end source if it knows it has changed since that time. Your server will have to be built to support this technique, of course.

A similar principle applies to E-Tag headers. An E-Tag is a simple hash code representing a specific state of the resource at a particular point in time. If the resource changes in any way, so does its E-Tag value. If the client sees an E-Tag in a response it should pass it in subsequent GET requests to the same URI, thereby allowing the server to quickly determine if the client has an up-to-date representation of the resource.

Finally, you should probably look at the long polling technique to reduce the number of repeated GET requests issued by your clients to the server. In essence, the trick is to issue very long GET requests to the server to watch for server data changes. The GET will not return a response until either the data has changed or the very long timeout fires. If the latter, the client just re-issues the same long-lived request to watch for changes again. See also topics like Comet and Web Sockets which are similar in approach.

Share:
18,391
Bart Jacobs
Author by

Bart Jacobs

Bart Jacobs runs Code Foundry, a mobile and web development company based in Belgium and writes about iOS and Swift development on his blog. Bart is also the mobile editor of Envato Tuts+.

Updated on June 06, 2022

Comments

  • Bart Jacobs
    Bart Jacobs almost 2 years

    Assume the following scenario A web application serves up resources through a RESTful API. A number of clients consume this API. The goal is to keep the data on the clients synchronized with the web application (in both directions).

    The easiest way to do this is to ask the API if any of the resources have changed since the client last synchronized with the API. This means that the client needs to ask the API for the appropriate resources accompanied by timestamp (to see if the data needs to be updated). This seems to me like the approach with the least overhead in terms of needless consumption of bandwidth.

    However, I have the feeling that this approach has a few downsides in terms of design and responsibilities. For example, the API shouldn't have to deal with checking whether the resources are out of date. It seems that the only responsibility of the API should be to serve up the resources when asked without having to deal with the updating aspect. By following this second approach, the client would ask for a lot of data every time it wants to update its data to keep it synchronized with the web application. In other words, the client would check whether the data it got back is newer than the locally stored data. If this process takes place every few minutes, this might become a significant burden for the system.

    Am I seeing this correctly or is there a middle road that I am overlooking?

  • Anders
    Anders over 11 years
    What this guy says (use standard HTTP functionality), but dont use GET for this, use HEAD, since you don't care about the actual data, but rather (as I understood it) you care about if the data is valid. If the data is not valid, then perform a GET-request to retrieve the new data.
  • Brian Kelly
    Brian Kelly over 11 years
    ^ Great point. HEAD is definitely more lightweight and appropriate for this kind of thing.
  • Panagiotis Panagi
    Panagiotis Panagi over 11 years
    I have a question on this: If the client issues a findAll for a certain type of records (e.g. Person.findAll() ), and assume that some records are return by the local data store. How does the client know if there are additional Persons on the server? Or the other way around, how can the server tell which records the client already has?
  • Rajiv
    Rajiv about 10 years
    Synchronization is a complex topic, specially when there are multiple nodes involved. Conflicts, Stale data, Deleted data are some of the issues not directly handled by REST conventions.
  • EThaiZone
    EThaiZone about 7 years
    @PanagiotisPanagi I think you can do it by get all primary key or all hash from those records and make hash by crc32.