How to get book metadata?

22,974

Solution 1

Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.

Solution 2

Instead of scraping Amazon, you can use the API they expose for their affiliate program: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.

Solution 3

This might be what you're looking for. They even offer a complete download! https://openlibrary.org/data

Solution 4

As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.

Now knowing the "right" term to search for I discovered WorldCat.org.

Maybe this whole MARC thing gives you a new kind of an idea :)

Share:
22,974
Admin
Author by

Admin

Updated on May 09, 2020

Comments

  • Admin
    Admin about 4 years

    My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.

    To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.

    The options I have tried are:

    1. Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
    2. Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
    3. ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
    4. Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
    5. Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.

    I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata, so any pointers are greatly appreciated.