How (il)legal is it to get data from a 100% accessible but not "exposed" API

52

Solution 1

Let me be clear. There is one thing I know fairly well, it is copyright law. I am not a lawyer, however, knowledge of copyright was a constant requirement of my consultancy for 30 years. As an added bonus, I consulted primarily to telecos and often worked with subscriber data and data analysis and presentation of said data for sale and re-use. I am at least, uniquely qualified to answer this question on this forum.

I will explain this the best that I can by: one, defining proprietary verses ordinary means; two, defining the cited case exception and other related copyright considerations; and three, being clear on the answer.

Let me clarify copyright some. The example of a phone book is a misnomer. When you get a telephone, you have entered into a private contract agreement as a private citizen with a private company and the resulting information, made public or not, is private proprietary data and therefore the contents of a phone book is proprietary (pay attention to this word) simply because it cannot be obtained generally through any other means except through company data sources- the subscriber data. If data can be derived through ordinary means, such as walking around and writing down house numbers and street names, then that is publicly available data and clear to use. This is not to say that telephone numbers cannot be obtained through ordinary means. It can be.

To clarify further. To quote from: http://www.lib.umich.edu/copyright/facts-and-data

In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.

This paragraph is misleading. This exception described in this paragraph is covered by patent and other laws. Copyright only extends to the creation of a work.

The:

“sweat of the brow” doctrine

...refers to any activity such as going house to house and gathering the data manually. This is the definition of ordinary means. It is possible to knock on doors and ask for the same telephone data. Only in as much as you can gather the facts by ordinary means is that data or portion of the proprietary data public.

The ordinary way around using telephone data is to: one, obtain the original data through legal means; and two, apply the fair use doctrine. This would entail getting a copy of the phone book directly from the company which may be free or for a charge, and organizing the facts within in a different way as to create a new work. Have you tried to get a Seattle phone book when you are in Chicago? You will find that the telephone company will likely charge you a surprising fee for it. However, if you are a telephone subscriber in Seattle and you ask for a Seattle phone book, the fee would be far less or even free. I have had to do this many times. There are people who's job it is just to obtain telephone books from telecos in person and paying the fee if required.

The ruling cited in case Feist Publications v. Rural Telephone in the above link (in this answer) hinges on two facts: one, being that the data by the rural cooperative operator as a local monopoly was required to be made publicly available by operational agreement; and two, that the presentation of the work was copyrighted and not the facts contained within due to fact #1. Therefore, only within narrow parameters can this case be considered as a precedent case and must be discarded. Ordinarily, private company subscriber data is not required by agreement to be made public. You have to remember that rural cooperatives are established as public trusts/entities for the public good and owned by the public and/or cooperative members and therefore operate under legal restrictions that allow it to be approved to operate or exist. Each case is different. The citing of the above case (on the linked page) as an argument without explanation of the carve-out exceptions is misleading.

In the early days of the Bell Telephone company, the company was required as a monopoly to make telephone data public unless restricted by the subscriber. When the Bell company was split into the baby Bells, Bell Atlantic, Bell South, and so on, these companies were still required as monopolies to make telephone data public as defined before. But with deregulation and indeed with VoIP, cellular, and other options, monopolies are rare. Only in monopoly scenarios can the above cited argument be made.

Continuing to cite the link above (in this answer):

Just because data is not protected by copyright, does not mean there are not other legal considerations that may come into play when you wish to use someone else’s dataset.

Keep this in mind.

Any given dataset and the presentation there of, regardless of the data origin, is a work unto itself. The public presentation of the facts, irregardless of the means, is a work unto itself.

Given that you are not obtaining the data through ordinary means, even though the data is made public, and regardless of the original origin of the data, it is not free to use as you described and you could be criminally charged and held civilly liable for potential copyright infringement as well for criminal trespass and illicit use of computer and other communications equipment not ordinarily authorized and can fall under RICO statutes.

Is it legal to use? No! Absolutely not! It was not obtained through ordinary means nor is it likely the intent of the website operator to expose proprietary data. Any absence of an AUP (acceptable use policy) will not help you. There are assumptions made under the law as to the "reasonable man", "reasonable standard", and "reasonable assumption" that protects the website owner in this case. It is not reasonable that a clever person would use a "vulnerability in the design/creation" of the website to obtain data for other use. As well, if the site profits from it's activities, further protections come into play.

Solution 2

One thing that doesn't seem very clear in the other answers here...

Whether it's "legal" or not, first and foremost, depends on the country. If we're talking about the United States, for example, then using the data itself is not illegal. However, I'd advise you to use the real data from the US Census. They offer tons of data through what they call TIGER products. This data set is the same data set that GIS professionals use to populate Bing maps, Google maps, etc.

However, while the data may be freely available, that does not necessarily mean the data from this exposed API is legally available. You say it's in JSON form, which suggests it's been 'massaged' from its original format into this format - and that custom format could fall under intellectual property. That, I believe, would be illegal to use unless you have the license to use it. Like others here, I am not a lawyer, but the company doesn't even need to point the finger at you and call you a hacker. Proprietary data is proprietary data, even if it is handed out unintentionally. You should contact the company and let them know all of this data is exposed to the outside world, and ask for permission to use it. Without doing that, and by having this question on stack exchange as evidence, it'd be easy to build a case against you. You've essentially said "This doesn't look legit, but I like it anyway and I want to make money off of it." Again, I'm not a lawyer, but that doesn't look like a great way to start a trial.

The thing is, though, if you're interested in city names and other geographic data, almost all of it is freely available, regardless of country. Last I knew, the US publishes the most data, but there's data out there for virtually every country. I'm hesitant to say all only because I'm a programmer and proving a "for all" statement is hard...if you pick an arbitrary country, the chances are better than good that the data is out there. If you have a specific country in mind, head to the GIS Stack Exchange. The main thing you're looking for are called "shapefiles", so ask a question like "Where can I get shapefiles for __________?" There's also OpenStreetMap which is an open source map. I'm not sure how easy it is to get their shapefile data, but if you can get it from them(and I don't see why you wouldn't be able to, you're able to run offline maps based on locally stored information), then you have all the data you need and you're in the clear legally. You'll have to spend time massaging the massive amounts of data down to what you want, but shapefiles are always very-well defined and easy to parse.

Solution 3

While closetnoc has discussed the issue of the data itself, there's a larger legal concern: you are not authorized to access the API offering the data.

The baseline for most computer crime laws involves the notion of "unauthorized access to a computer system". You should not confuse this reference to authorization in the legal sense with the concept of authorization when it comes to access control. The owner of a system does not have to secure his system for access to it to be illegal, just as you are still trespassing when you enter a house with an unlocked door.

In this case the apparent lack of security does not imply an authorization to use it. The concept of the internet has little precedent so far in case law but you can imagine the use of the HTTP 80 protocol to imply public authorization to view a website. Contrarily, background RPC protocols (even if they might run on HTTP requests) are not typically understood to be publicly available unless the operator publishes the service as such, granting authorization for use to third parties.

So ongoing use of the API to retrieve data would be illegal. The act of taking a data dump from the API to build your own dataset would also be illegal. Whether use of the data after that is illegal is a giant grey area but closetnoc has covered most of the concerns.

Of course if you modify the data dump after the fact to be unrecognisable it will be next to impossible to prove that you committed a crime. But if you're going to that much trouble why not source the data from a legal source instead?

Solution 4

It probably depends on the nature of the data. Pure data (think telephone directory) cannot be copyrighted. So a list of cities from an API should be fair game to copy and show to users. However, if that API has descriptions of the city those descriptions would fall under copyright law and you wouldn't be able to use them without violating copyright.

If you can legally copy the data, I would recommend copying it to your own site to prevent your API usage from being shut down prematurely.

Solution 5

(IANAL, and laws and norms vary widely throughout the world, but certain things tend to remain consistent due to IP treaties. If you have a professional issue outside of your specialty, consult with a profesional.)

Generally, legally, an API is not considered to be "intended for public consumption" unless it's actively documented as a public API, with specified terms of service. The fact that the public can reach the API does not make it public.

In cases where the status of the data itself isn't starkly public-domain, and in a few cases where it clearly is public domain, the intent of the entity providing the API matters a great deal. If the website operator intended the API to be used to feed a dynamic webpage, or a mobile application (to name two common examples) any other usage is "unauthorized" unless specifically authorized somewhere. If the intended consumer was a snippet of dynamic code in a specific webpage, for the purpose of placing human-understandable pixels on a display in a specific, (hopefully)pleasing and useful manner, any other usage is unauthorized.

The technical ability to enter a building through an open window without opening or breaking anything won't protect you from being arrested for criminal trespass...

Also, it is almost never wise to play "technical ability" vs "original intent" games with an intellectual property lawyer. If nothing else, remember that the lawyers who consistently lose those cases don't keep getting paid for them.

Share:
52

Related videos on Youtube

Suresh Ramineni
Author by

Suresh Ramineni

Updated on September 18, 2022

Comments

  • Suresh Ramineni
    Suresh Ramineni almost 2 years

    I have a Grid Which contains Access Details of a Users which is of multiple rows

    -User Can take Retain Action \Delete Action against Each Record -On click of Submit we need to send this AccessID and Action as a key value Pair to Database.

    I Have a possible solution for this and i'm achieving this with Below method

    1.Make an Xml string using StringBuilder Class and send that xml string as a parameter 2.De-couping that using using Sql inbuilt function into a temporary table.

    This approach is causing me Performance issues

    Please suggest me if any one a alternative for my problem

    • Simon Hayter
      Simon Hayter over 9 years
      It's simple! ask a licensing consultancy or a copyright lawyer! not a webmaster!
    • MikO
      MikO over 9 years
      @bybe, I'm sorry I cannot find the Stack Exchange's Licensing Consultancy site, and I only ask questions to Stack Exchange users or to god... and god never responds.
    • closetnoc
      closetnoc over 9 years
      You will want to read my answer and think again carefully.
    • Riot
      Riot over 9 years
      Why is nobody asking "legal in what country / jurisdiction"?
    • Anderson Green
      Anderson Green over 9 years
      Can you provide the URL of this particular website?
    • A E
      A E over 9 years
      Depends what country you're in. What country are you in? The UK as per your profile? If so then database right may be in play.
  • MikO
    MikO over 9 years
    Thanks for your reply. Just not clear what can be pure data. Some examples: the number of inhabitants in a city or the number of new vehicle registrations last year in a city... Or in other contexts: the number of goals scored by Cristiano Ronaldo this season or the list of concerts of U2 next year... are all those pure data?
  • Stephen Ostermiller
    Stephen Ostermiller over 9 years
    I am not a lawyer. An intellectual property lawyer would be better suited to advise you as to how the law might be applied to your specific data. The examples you gave look like data to me. Only the presentation or arrangement of them can be copyrighted (per the link in my answer)
  • Stephen Ostermiller
    Stephen Ostermiller over 9 years
    Great counterpoint. It bears repeating that it is often worth consulting a lawyer when in doubt about the legality of what you are doing.
  • MikO
    MikO over 9 years
    Thanks so much, I knew I could find a good answer here! Just a final scenario: there's a website gathering and displaying data about football players' performance (minutes played, goals scored, etc.) during a season - actually all newspapers and TVs will gather and show the same information. Now let's say I have 3 options to collect that data: a) As said in my question, I find a API URL of this site which somehow is public. b) I use some crawling tool to get the data. c) I manually watch the site and collect every single value. Are these 3 options the same in terms of (il)legal issues?
  • closetnoc
    closetnoc over 9 years
    The NFL states that it exclusively owns the team/player stats and therefore licenses the data for use. If I see it on T.V., it through a license agreement, if I see it in the news paper, it is through a license agreement, magazines, same thing. Most sources you would get the data would be by license. However, if obtained through ordinary means, such as asking a friend, then that is legal. However, the NFL, while very careful about licensing, will not shoot you if you post stats on a website that does not profit from the information. They may write a letter, but generally, they will not bother.
  • nobody
    nobody over 9 years
    Indeed. Auernheimer spent over three years in prison for screen-scraping AT&T, even though the conviction was eventually overturned.
  • closetnoc
    closetnoc over 9 years
    You bring up some great points! Any data from the U.S. Government by legal authority is public domain as it has already been paid for and owned by the public through taxes. The government likes to charge fees for this data when provided in particular format such as printed (GPO government printing office), on magnetic tape or CD or other media, and so on. They do this to recover labor/material costs, though I sometimes argue over the fee vs. cost. The U.S. Government has been in the profit game for a couple of decades. Why do you think they really want all the extra census data? They sell it.
  • Andrew Grimm
    Andrew Grimm over 9 years
    @AndrewMedico he was sentenced to over three years, but didn't spend three years in prison. The article says he was convicted in November 2012, and it was overturned in April 2014.
  • Lilienthal
    Lilienthal over 9 years
    -1. This is incorrect and dangerous advice. The baseline for most computer crime laws involves the notion of "unauthorized access to a computer system". Just because a system is (apparently) not secured does not mean that you are authorized to use it. You should also not confuse authorization in the legal sense with authorization in an access control context.
  • Ian Ringrose
    Ian Ringrose over 9 years
    The law in the UK is not the same in that a collection of "public data" can be copyrighted, even if each data item can not.
  • pseudocoder
    pseudocoder over 9 years
    Interesting points and I agree with you, except it doesn't make sense to me when you argue "the AJAX RFC protocol is an internal system not intended for public access". I'm not sure what protocol has to do with it. Many organizations offer exposed data services such as this for the public's use. For instance, local governments offering GIS data. It makes more sense to me that significant facts are the way OP discovered the data service and that it is apparent it isn't intended for public use. Am I off base here?
  • closetnoc
    closetnoc over 9 years
    I do cover the fact that access of this type would not be legal under any standard, however, I am glad you mentioned it again. It is an important point. Any AUP should state that access authorization by default is NOT granted (of course it does not have to be that draconian). The idea is to establish a negative default first then define the acceptable use in a rather narrow webbie kinda way.
  • Lilienthal
    Lilienthal over 9 years
    @pseudocoder No you are correct. What I meant by that is that there is an established reasonable expectation that access to a HTTP service is public by default while the opposite is true for RFC services and similar protocols. Such services are generally published for public use and announced as such. As I said I don't know the case law on this or how the various cybercrime laws handle this in practice, but it's an important distinction. [...]
  • Lilienthal
    Lilienthal over 9 years
    If you didn't have this expectation of public access you'd be in violation of accessing StackExchange because its owner hasn't explicitly granted you the authorization to view the page. A reasonable person (a popular concept in law discussions) would expect a website service to be accessible to all and for sensitive parts to be hidden behind an access control layer. That same person would NOT expect "hidden" background services (that the average person doesn't understand) to similarly qualify as free for all. I've edited my answer to hopefully better explain this dichotomy.
  • Jason
    Jason over 9 years
    @closetnoc, is the illegality just in using a hidden API to obtain the information? If you were to use the website as intended and to write down the information manually, and then use it, would that be legal? Followup: if so, and you automated that process, would it still be legal?
  • closetnoc
    closetnoc over 9 years
    It would depend upon the information provided and who owns it. Following the phone book analogy, the phone company owns the data, but if you look up a phone number and write it down, it falls under reasonable man, reasonable assumption, and reasonable standard that you will use the data to call someone and for personal use- not to re-purpose particularly for profit or gain. The data still belongs to the phone company unless there is an exception like the case cited. Writing it down from copyrighted work is not ordinary means. However, you do have fair use rights to portions of the work.
  • shelleybutterfly
    shelleybutterfly over 9 years
    @closetnoc I'm curious: if, hypothetically, the site's robots.txt did not restrict the URL, and queries from it were available on various search engines, would that affect your analysis? With the JSON I've seen being more readable than the touted "human-readable" XML (but they said! :D) [e.g. SOAP] do you agree that it would be arguably "obtained by 'ordinary means'" at that point? On a browser that had one of the easy-JSON-viewing extensions installed, clicking a link and viewing the data would be trivial. And I imagine an auto-JSON-REST-query-field-search addition not to be far behind...
  • closetnoc
    closetnoc over 9 years
    Ordinary means disappears when you are accessing another's work. Copyright in the U.S. is automatically assumed under the law. The law does not say things like JSON, SOAP, robots.txt, etc. As well, you are traversing private property to obtain the work with restrictions that will apply. These restrictions may surprise you. The simple fact of the matter is that you derived the information from another's work. Full stop. Now matter how that manifests, copyright laws protect the original work. You do have fair use rights to reference and quote the work in a new work of your creation however.
  • closetnoc
    closetnoc over 9 years
    The OP's question is not about a hyperlink. Nor does the case you cited apply. In fact, this case should not have been filed. The copyright infringement accusation made by the claimant is nonsensical since the fair use doctrine would apply squarely. The keyword is transformative. This is another key element that defines fair use along with the right to reference or quote another's work which search engines do. The OP is talking about whole sale extraction and use of another's work which is a violation of copyright amongst other RICO violations which are federal and significant.
  • shelleybutterfly
    shelleybutterfly over 9 years
    /* I AM NOT A LAWYER / i definitely ~tend~ to this side on this one... *especially via linking via http/https, of a RESTish JSON Query API, already being used to grab data for a website that's open to the public in the U.S. by current law [enough qualifying? xD] "should" not against the law due to: 1. no DMCA circumvention - the fact that no "circumvention" is necessary as there are no measures in place to circumvent. the mere claim that one has put protections on something in order to prevent people copying it, is not enough, the measures law.cornell.edu/uscode/text/17/1201
  • shelleybutterfly
    shelleybutterfly over 9 years
    /* I!=AL / and 2. one of the main focii here: *linking is not the same as having. it's likely ok for an online game, as mentioned in the Q, if the player's browser downloaded; OR (maybe) if everything is grabbed in response to player acts. [IMO, it would be right to credit the site, somewhere.] but, here's the rub: we get the tech. But many many judges/juries are tech illiterate, and some seem to willfully misunderstand. At best when you're in that situation you have to waste resources fighting. So, if you can do a game like this, save yourself the trouble. Do something else. IMO. gl.
  • Shane
    Shane over 9 years
    @closetnoc: You may want to reread the OP. He is talking about visiting a web page that contains some text. That's a hyperlink. Wikipedia on hyperlinks: "An inline link displays remote content without the need for embedding the content. The remote content may be accessed with or without the user selecting the link. An inline link may display a modified version of the content; for instance, instead of an image, a thumbnail, low resolution preview, cropped section, or magnified section may be shown." That is exactly what the OP is talking about, no?
  • Shane
    Shane over 9 years
    @closetnoc: He plans on taking that text used for presenting information and transforming it into a game. That's transformative. Unless he saves the data himself, he also isn't talking about extracting anything, just viewing it and transforming it. I have no idea why you think that laws against organized racketeering have any relevance here. The OP is asking if it is violates copyright laws to visit a hyperlink, or to transform the data provided by that hyperlink into something new.
  • shelleybutterfly
    shelleybutterfly over 9 years
    @closetnoc actually, the OP ends with "I want to use that data to create a little online game, potentially to earn a little money..." so regardless of the initial thought by OP; i see no reason that it couldn't be designed with the player downloading the data. see my #2. (maybe? server-side tho that makes me nervous as the API isn't proffered) And, tho I do see issues with wholesale d/l of the data, I believe that "data use via the proffered html pages is the only use that is possibly legal" is very arguably too strict. but, as i don't trust the courts on this one, it's pretty moot for me.
  • closetnoc
    closetnoc over 9 years
    You need to re-read the OP's question. He is talking about extracting data directly using an exposed API. Either way. Both are protected by copyright. Simply visiting a page is not a hyperlink. A link is a link not the product of HTML formatting of content. Using proprietary data which is copyrighted and transforming it into another product (keyword here) does not cover the fact of a violation. The transformative nature has to be significant. As well, you cannot ever use the entirety of any copyrighted work in another work. It is clear that you need to re-read my answer slowly.
  • shelleybutterfly
    shelleybutterfly over 9 years
    @Shane <nod>having read it several times now I am more convinced that indeed that was the sense intended: "how legal or illegal is it for me to use that URL to retrieve the data for my own purposes?" combined with "Note:..._I want to use that data_ to create a little online game ..." [emphasis mine] in other words, if that URL is used in composing a game on a website, is that a problem... i stand by my view tho; the law can be a pain; you could get sued, or worse; and do you have the time for jail? or 5K+ for a lawyer? xD Hell, ask for permission. Worst case, use data from elsewhere.
  • closetnoc
    closetnoc over 9 years
    @shelleybutterfly Oh lord! where do I begin? Is hot-linking a bad thing? You think using an exposed flaw in anothers website is any better? Any webpage is copyrighted automatically. Wholesale extraction regardless of the method is a violation of copyright. You too need to re-read my answer carefully. Pay attention to the reasonable ?? standards and the concept of ordinary means. As well pay attention of what constitutes a work. Remember, I did this for 30 years. I pretty much got it down pat.
  • closetnoc
    closetnoc over 9 years
    @shelleybutterfly Perfect! You got it! Ask permission. That's all it takes. Heck, the site owner may OK using the exposed API for the purpose. Who knows? He just might be happy with a credit or access to the game. Heck. He might not even care about that.
  • shelleybutterfly
    shelleybutterfly over 9 years
    @closetnoc maybe you need to re-read what I said a little more "slowly." so, y'know, you actually get what I said. if I create a game with a button linking to "SomeDataBelong2.us/JSON/…" and the player clicks and SomeDataBelong2 has their API wide open, via http, on port 80, returning data, from a ridiculously simple JSON Query... And someone's browser gets an HTML field or Attribute populated, well... regardless of what SomeDataBelong2.us wanted, linking (see above) is ruled-upon case law. you seem to miss that it's not like accessing SQL.
  • shelleybutterfly
    shelleybutterfly over 9 years
    @closetnoc i think we're probably on the same side more than you think; not the same side as regards what the law should be perhaps, but at least as regards what would be the right thing to do. i tend to also think that the right thing to do is contact them and be like "uh, dudes, database, in the clear...." hell, do that, and they might be happy to let you use their service. and, again, if not, there's plenty of places to get such data, certainly enough that you don't have to do anything shady to make a game.
  • closetnoc
    closetnoc over 9 years
    @shelleybutterfly It all goes back to the construct of the reasonable standard. This is a founding legal construct our system is based upon. It is not reasonable to expect that someone would access the API (for a lack of a better term) and extract data for other purposes. Ask yourself what the reasonable man would do? He would access the web page as a normal consumer of content using a web browser. If the OP did access the site via the API, he would lose in court solely based upon the reasonable standard.
  • Hagen von Eitzen
    Hagen von Eitzen over 9 years
    @Lilienthal The availablitity of the web site in the OP question suggests that everybody is authorized. Actually, one could even say that the user agent runs a javascript from the server and so in reverse authorizes the web site to run code and execute additional queries on the user's computer
  • Lilienthal
    Lilienthal over 9 years
    @HagenvonEitzen It most certainly does not. The website is public-facing and consumes the back-end service to display data. Compare it to a coffee shop: you're allowed to order an espresso but you're not allowed to jump behind the counter and brew it just the way you want it, you have to go through the barista.
  • ruakh
    ruakh over 9 years
    By "RFC" do you mean "RPC"?
  • closetnoc
    closetnoc over 9 years
    @HagenvonEitzen Because a computer is put on the Internet and offers web content, does not mean that authorization is automatically granted. Quite the contrary. Most AUPs state clearly under what conditions the site may be accessed rendering any other condition an unauthorized access violation. As well, it all goes back to the reasonable man construct.
  • closetnoc
    closetnoc over 9 years
    @Lilienthal You are so right on!
  • closetnoc
    closetnoc over 9 years
    @Chloe By all rights, this answer should be down-voted simply because it is factually incorrect and may potentially help lead someone to commit an illegal act. Would you consider editing your answer?
  • Mark Hurd
    Mark Hurd over 9 years
    For a concrete SE example: I guess the people who access the separated up and down vote information on SE sites without having the requisite rep to see it normally are really breaking the law.
  • Lilienthal
    Lilienthal over 9 years
    @ruakh I do indeed. The platform I'm currently working with calls them Remote Function Calls, hence the confusion. I've corrected the answer.
  • A E
    A E over 9 years
    What makes you think that US law applies?