Python requests arguments/dealing with api pagination

67,370

Solution 1

Read last_page and make a get request for each page in the range:

import requests

r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']

for page in range(2, num_pages + 1):
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
    print r_sanfran['page']
    # TODO: extract the data

Solution 2

Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

import requests

session = requests.Session()

def get_jobs():
    url = "https://api.angel.co/1/tags/1664/jobs" 
    first_page = session.get(url).json()
    yield first_page
    num_pages = first_page['last_page']

    for page in range(2, num_pages + 1):
        next_page = session.get(url, params={'page': page}).json()
        yield next_page

for page in get_jobs():
    # TODO: process the page

Solution 3

I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.

max_version = [1]
while len(max_version) > 0:
    r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
    next_page = r['page']
    if next_page is not None:
        max_version[0] = next_page
        Process data...
    else:
        max_version.clear() # Stop the while loop
Share:
67,370
crock1255
Author by

crock1255

Med Student with a developer/analyst problem.

Updated on March 26, 2021

Comments

  • crock1255
    crock1255 about 3 years

    I'm playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn't find an active Python wrapper for the api (if I make any headway, I think I'd like to make my own), I'm using the requests library.

    The AL API's results are paginated, and I can't figure out how to move beyond the first page of the results.

    Here is my code:

    import requests
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
    r_sanfran.keys()
    # returns [u'per_page', u'last_page', u'total', u'jobs', u'page']
    r_sanfran['last_page']
    #returns 16
    r_sanfran['page']
    # returns 1
    

    I tried adding arguments to requests.get, but that didn't work. I also tried something really dumb - changing the value of the 'page' key like that was magically going to paginate for me.

    eg. r_sanfran['page'] = 2

    I'm guessing it's something relatively simple, but I can't seem to figure it out so any help would be awesome.

    Thanks as always.

    Angel List API documentation if it's helpful.

  • Jon Clements
    Jon Clements almost 11 years
    I'm guessing that should be range(2, num_pages + 1) since the first page is 1, and 16 is the total number of pages, so will want that included in the range... (and might want to use requests.get('http://...blah...?', params={'page': page}) to avoid string interpolation
  • crock1255
    crock1255 almost 11 years
    Ah. I didn't see that I could pass a params argument in requests.get
  • crock1255
    crock1255 almost 11 years
    Ah. Okay. I see what is going on. I was thinking th pagination was coming from angel.co/1...angel.co/2, etc. Thanks for the help!
  • API_sheriff_orlie
    API_sheriff_orlie over 8 years
    missing closing '}' after params={
  • alecxe
    alecxe over 8 years
    @API_sheriff_orlie yup, fixed, a huge thank you from a perfectionist to a perfectionist :)
  • API_sheriff_orlie
    API_sheriff_orlie over 8 years
    @alecxe high-5 back to you ;-)
  • jangeador
    jangeador over 5 years
    This is outstanding. I was able to use this with a minor modification to navigate a drf api using LimitOffsetPagination
  • Arjan
    Arjan about 2 years
    Not better, but when one also needs to provide the page size as a parameter, say params={"page": 1, "page_size": 25} then one can define the full params once, and assign directly into params["page"], like: for params["page"] in range(2, ...) along with session.get(url, params=params) 😎