Python requests arguments/dealing with api pagination
Solution 1
Read last_page
and make a get request for each page in the range:
import requests
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']
for page in range(2, num_pages + 1):
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
print r_sanfran['page']
# TODO: extract the data
Solution 2
Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.
import requests
session = requests.Session()
def get_jobs():
url = "https://api.angel.co/1/tags/1664/jobs"
first_page = session.get(url).json()
yield first_page
num_pages = first_page['last_page']
for page in range(2, num_pages + 1):
next_page = session.get(url, params={'page': page}).json()
yield next_page
for page in get_jobs():
# TODO: process the page
Solution 3
I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.
max_version = [1]
while len(max_version) > 0:
r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
next_page = r['page']
if next_page is not None:
max_version[0] = next_page
Process data...
else:
max_version.clear() # Stop the while loop
Comments
-
crock1255 about 3 years
I'm playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn't find an active Python wrapper for the api (if I make any headway, I think I'd like to make my own), I'm using the requests library.
The AL API's results are paginated, and I can't figure out how to move beyond the first page of the results.
Here is my code:
import requests r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json() r_sanfran.keys() # returns [u'per_page', u'last_page', u'total', u'jobs', u'page'] r_sanfran['last_page'] #returns 16 r_sanfran['page'] # returns 1
I tried adding arguments to
requests.get
, but that didn't work. I also tried something really dumb - changing the value of the 'page' key like that was magically going to paginate for me.eg.
r_sanfran['page'] = 2
I'm guessing it's something relatively simple, but I can't seem to figure it out so any help would be awesome.
Thanks as always.
Angel List API documentation if it's helpful.
-
Jon Clements almost 11 yearsI'm guessing that should be
range(2, num_pages + 1)
since the first page is 1, and 16 is the total number of pages, so will want that included in the range... (and might want to userequests.get('http://...blah...?', params={'page': page})
to avoid string interpolation -
crock1255 almost 11 yearsAh. I didn't see that I could pass a params argument in requests.get
-
crock1255 almost 11 yearsAh. Okay. I see what is going on. I was thinking th pagination was coming from angel.co/1...angel.co/2, etc. Thanks for the help!
-
API_sheriff_orlie over 8 yearsmissing closing '}' after params={
-
alecxe over 8 years@API_sheriff_orlie yup, fixed, a huge thank you from a perfectionist to a perfectionist :)
-
API_sheriff_orlie over 8 years@alecxe high-5 back to you ;-)
-
jangeador over 5 yearsThis is outstanding. I was able to use this with a minor modification to navigate a drf api using LimitOffsetPagination
-
Arjan about 2 yearsNot better, but when one also needs to provide the page size as a parameter, say
params={"page": 1, "page_size": 25}
then one can define the fullparams
once, and assign directly intoparams["page"]
, like:for params["page"] in range(2, ...)
along withsession.get(url, params=params)
😎