HATEOAS: absolute or relative URLs?

23,849

Solution 1

There is a subtle conceptual ambiguity when people say "relative URI".

By RFC3986's definition, a generic URI contains:

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

  hier-part   = "//" authority path-abempty
              / path-absolute
              / path-rootless
              / path-empty

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment

The tricky thing is, when scheme and authority are omitted, the "path" part itself can be either an absolute path (starts with /) or a "rootless" relative path. Examples:

  1. An absolute URI or a full URI: "http://example.com:8042/over/there?name=ferret"
  2. And this is a relative uri, with absolute path: /over/there
  3. And this is a relative uri, with relative path: here or ./here or ../here or etc.

So, if the question was "whether a server should produce relative path in restful response", the answer is "No" and the detail reason is available here. I think most people (include me) against "relative URI" are actually against "relative path".

And in practice, most server-side MVC framework can easily generate relative URI with absolute path such as /absolute/path/to/the/controller, and the question becomes "whether the server implementation should prefix a scheme://hostname:port in front of the absolute path". Like the OP's question. I am not quite sure about this one.

On the one hand, I still think server returning a full uri is recommended. However, the server should never hardcode the hostname:port thing inside source code like this (otherwise I would rather fallback to relative uri with absolute path). Solution is server-side always obtaining that prefix from HTTP request's "Host" header. Not sure whether this works for every situations though.

On the other hand, it seems not very troublesome for the client to concatenate the http://example.com:8042 and the absolute path. After all, the client already know that scheme and domain name when it send the request to the server right?

All in all, I would say, recommend to use absolute URI, possibly fallback to relative URI with absolute path, never use relative path.

Solution 2

It depends on who is writing the client code. If you are writing the client and server then it doesn't make much difference. You will either suffer the pain of building the URLs on the client or on the server.

However, if you are building the server and you expect other people to write client code then they will love you much more if you provide complete URIs. Resolving relative URIs can be a bit tricky. First how you resolve them depends on the media-type returned. HTML has the base tag, XML can have xml:base tags in every nested element, Atom feeds could have a base in the feed and a different base in the content. If you don't provide your client with explicit information about the base URI then they have to get the base URI from the request URI, or maybe from the Content-Location header! And watch out for that trailing slash. The base URI is determined by ignoring all characters to the right of the last slash. This means that trailing slash is now very significant when resolving relative URIs.

The only other issue that does require a small mention is document size. If you are returning a large list of items where each item may have multiple links, using absolute URLs can add a significant amount of bytes to your entity if you do not compress the entity. This is a performance issue and you need to decide if it is significant on a case by case basis.

Solution 3

The only real difference would seem to be that it's easier for clients if they are consuming absolute URIs instead of having to construct them from the relative version. Of course, that difference would be enough to sway me to do the absolute version.

Solution 4

As your application scales, you may wish to do load balancing, fail-over, etc. If you return absolute URIs then your client-side apps will follow your evolving configuration of servers.

Solution 5

Using RayLou's trichotomy my organization has opted for favoring (2). The primary reason is to avoid XSS (Cross-Site Scripting) attacks. The issue is, if an attacker can inject their own URL root into the response coming back from the server, then subsequent user requests (such as an authentication request with username and password) can be forwarded to the attacker's own server*.

Some have brought up the issue of being able to redirect requests to other servers for load balancing, but (while that is not my area of expertise) I would wager that there are better ways to enable load balancing without having to explicitly redirect clients to different hosts.

*please let me know if there any flaws in this line of reasoning. The goal, of course, is not to prevent all attacks, but at least one avenue of attack.

Share:
23,849
Mark Lutton
Author by

Mark Lutton

As a kid I was fascinated with computers. I read books about them and made a paper Turing machine. At Boston University, I got an account on the computer, took a course in Fortran and programmed it with the rules of counterpoint from Professor Norden’s book. After graduating with a degree in music, I found there wasn’t much of a market for concert pianists but there was for super-fast key data entry operators, so I took a job at a computer service bureau and soon moved up to operating the IBM 370. In my spare time I programmed an ancient computer to play music through a radio. Soon I moved into programming, becoming expert in COBOL, Fortran and Assembler. That was just the start of a long and successful career. I have worked with Windows, Linux, Java, J2EE, Tomcat, Struts, JavaScript, Dojo, and dabbled a bit with Groovy, Gradle, EasyMock, Selenium, and too many others to remember. I consider myself strongest in Java, J2EE, JavaScript and Python, but I’ll learn anything you need. The improvements to JavaScript in the past few years are remarkable: with the asynchronous facilities you can use a shared-nothing message-passing design that avoids the old concurrency problems. Now you may be wondering why I haven’t written articles and attended a lot of conferences and made a name for myself in the software industry. Well, I do have a pretty good reputation here on StackOverflow so there’s that. But mainly I spent the time practicing the piano and organ. I played in competitions from time to time. In 2008 I went on a concert tour with flutist Cassie White, playing in Weill Recital Hall at Carnegie Hall. Our CD is available from Amazon.com and on iTunes and Spotify. Back in 2016 I decided to take a break. I left Cisco and went back to college part time. In May, 2020 I got my Associate degrees in Accounting and Small Business Management, Summa Cum Laude, at Nashua Community College, so I’m a recent college graduate! Companies are ALWAYS looking for recent college graduates. I have also started on a Master of Science degree in Accounting online at Southern New Hampshire University. So far I have taken undergrad courses in Statistics, Financial Accounting, Cost Accounting, Intermediate Accounting, Management, Sales, Advertising, Small Business Management, Desktop Applications, Business Law, and HR Management, and graduate-level courses in Financial Reporting, Math and Stats for Business, Federal Taxation of Individuals, Economics (both micro and macro), and Auditing. Coming up next: Corporate Financial Management. I have a 4.0 GPA in all these courses. Your takeaway from this is that I still love to learn things, I get things done on time and with the highest quality, and I am particularly adept at finding and fixing bugs. Now it’s time for me to get back into software engineering. Got some tickets you need closed?

Updated on March 09, 2021

Comments

  • Mark Lutton
    Mark Lutton about 3 years

    In designing a RESTful Web Service using HATEOAS, what are the pros and cons of showing a link as a complete URL ("http://server:port/application/customers/1234") vs. just the path ("/application/customers/1234")?

  • Ed Summers
    Ed Summers over 11 years
    Why can't the absolute URI use the hostname of the proxy?
  • mag382
    mag382 over 10 years
    Working through this exact issue at the moment. We want all requests to go through a sort of "load-balancing" layer first. Absolute URIs to the servers directly will break this model.
  • Lawrence Dol
    Lawrence Dol over 10 years
    This is a good answer (+1) which I agree with except the final conclusion. However in my answer I argue that the HTTP spec defines, by example, "absolute" to refer to an absolute path, not a fully qualified URI. So I disagree with your (2) - it is an absolute URI, but one for which the client must infer the network protocol and host, so it's not a fully qualified URI. And, therefore, I also disagree with your definition of (1) which is both a full URI and and absolute URI.
  • Lawrence Dol
    Lawrence Dol over 10 years
    Provided you define "absolute" as absolute path (e.g. /xxx/yyy...) and not as meaning a fully qualified URI (e.g. http://api.example.com/xxx/yyy...).
  • Mark Bober
    Mark Bober over 10 years
    Well, the HTTP spec for the Location header says absolute URI. An absolute URI must contain a scheme (e.g. http).
  • Lawrence Dol
    Lawrence Dol over 10 years
    But the question is not how to construct contextless opaque identifiers, it asks how to construct links. The latter may rightly infer "at the same network location as this document", and that's exactly what the spec's example of a Location header gives - an absolute URI which doesn't contain the URI scheme or the server's network location. While links and IDs are often conflated they are not the same thing - the former has context, the latter does not.
  • Mark Bober
    Mark Bober over 10 years
    Can you send a link to the part of the spec you're talking about?
  • Mark Bober
    Mark Bober over 10 years
    An absolute URI specifies a scheme; a URI that is not absolute is said to be relative. URIs are also classified according to whether they are opaque or hierarchical. An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are: mailto:[email protected] news:comp.lang.java urn:isbn:096139210x
  • Mark Bober
    Mark Bober over 10 years
    Ah, see, I think you're looking at a draft spec. Check this one: w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30
  • Lawrence Dol
    Lawrence Dol over 10 years
    Ahh, I see you are correct; removed my downvote and deleting comments no longer salient.
  • Mark Bober
    Mark Bober over 10 years
    Hey no worries man. One other point about this stuff is that I've seen people using hrefs as IDs. So that the client doesn't need to reconstruct the URL from some config file and an id, it just knows the URL and can cache based on it.
  • Tom Howard
    Tom Howard over 10 years
    I'm using Nginx to proxy a site with absolute URLs. It's perfectly capable of replacing the backend URL with the equivalent proxy URL. Specifically it's proxing windyroad.artifactoryonline.com (which has fully qualified URLs and fully qualified redirects) to repo.windyroad.com.au
  • RayLuo
    RayLuo about 10 years
    Thanks for the comment. I just borrow the absolute path and relative path concept from file system. Different terms apart, I don't see substantial difference between your opinion and mine. You also recommend form 1 & 2, and you against form 3, don't you?
  • Lawrence Dol
    Lawrence Dol about 10 years
    Practically speaking, I am for (2); I think (1) requires the backend to have to much HTTP specific knowledge (meaning about the details of the specific HTTP environment, not HTTP in general), and (3) seems to require too much of the client. But, my reasoning was based on the original draft spec, and the examples were changed in a later version in a way that invalidates my reasoning.
  • Lawrence Dol
    Lawrence Dol about 10 years
    Personally, I am not (yet) at all convinced that HATEOAS, and therefore the demand of returning URIs makes all that much sense for an API. I am just not seeing my APIs being driven on the client in a manner akin to browsing a web site; the use cases seem very much driven by adhoc function.
  • RayLuo
    RayLuo about 10 years
    @LawrenceDol I have same confusion about HATEOAS at the beginning. Now I consider it as a matter of choice. Your clients can use adhoc function to consume your api for sure, but if they/you want, they/you can still develop a pattern for them to follow, so that the client won't need to hard code each exact url. That is HATEOAS.
  • RayLuo
    RayLuo about 7 years
    Glad that my previous answer was helpful to your organization. Yes, I personally also prefer (2), a.k.a. scheme-less absolute path. However I'm curious about your reasoning. How did you enforce your client accepting your scheme-less url only? A generic client, such as a browser, would not reject a scheme-less url at all. So I assume you would have to write your own client-side code to validate urls before actually following them? While that is technically doable (but not necessarily useful), this kind of client-side validation is typically not part of REST or HATEOAS discussion.
  • M. Eriksson
    M. Eriksson over 5 years
    I know this is an old post, but I just want to point out that "if an attacker can inject their own URL root into the response coming back" is kind of a nonsense reason. If they can "inject their own URL" into the correct places in the response, I bet that they could, just as easily just replace your hostname with their own. So out of a security point of view, I don't see it as a valid argument.