Asynchronous Requests with Python requests
Solution 1
Note
The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests
with grequests
below and it should work.
I've left this answer as is to reflect the original question which was about using requests < v0.13.0.
To do multiple tasks with async.map
asynchronously you have to:
- Define a function for what you want to do with each object (your task)
- Add that function as an event hook in your request
- Call
async.map
on a list of all the requests / actions
Example:
from requests import async
# If using requests > v0.13.0, use
# from grequests import async
urls = [
'http://python-requests.org',
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
# A simple task to do to each response object
def do_something(response):
print response.url
# A list to hold our things to do via async
async_list = []
for u in urls:
# The "hooks = {..." part is where you define what you want to do
#
# Note the lack of parentheses following do_something, this is
# because the response will be used as the first argument automatically
action_item = async.get(u, hooks = {'response' : do_something})
# Add the task to our list of things to do via async
async_list.append(action_item)
# Do our list of things to do via async
async.map(async_list)
Solution 2
async
is now an independent module : grequests
.
See here : https://github.com/kennethreitz/grequests
And there: Ideal method for sending multiple HTTP requests over Python?
installation:
$ pip install grequests
usage:
build a stack:
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
send the stack
grequests.map(rs)
result looks like
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
grequests don't seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.
Solution 3
I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.
import requests
import concurrent.futures
def get_urls():
return ["url1","url2"]
def load_url(url, timeout):
return requests.get(url, timeout = timeout)
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
resp_err = resp_err + 1
else:
resp_ok = resp_ok + 1
Solution 4
Unfortunately, as far as I know, the requests library is not equipped for performing asynchronous requests. You can wrap async/await
syntax around requests
, but that will make the underlying requests no less synchronous. If you want true async requests, you must use other tooling that provides it. One such solution is aiohttp
(Python 3.5.3+). It works well in my experience using it with the Python 3.7 async/await
syntax. Below I write three implementations of performing n web requests using
- Purely synchronous requests (
sync_requests_get_all
) using the Pythonrequests
library - Synchronous requests (
async_requests_get_all
) using the Pythonrequests
library wrapped in Python 3.7async/await
syntax andasyncio
- A truly asynchronous implementation (
async_aiohttp_get_all
) with the Pythonaiohttp
library wrapped in Python 3.7async/await
syntax andasyncio
"""
Tested in Python 3.5.10
"""
import time
import asyncio
import requests
import aiohttp
from asgiref import sync
def timed(func):
"""
records approximate durations of function calls
"""
def wrapper(*args, **kwargs):
start = time.time()
print('{name:<30} started'.format(name=func.__name__))
result = func(*args, **kwargs)
duration = "{name:<30} finished in {elapsed:.2f} seconds".format(
name=func.__name__, elapsed=time.time() - start
)
print(duration)
timed.durations.append(duration)
return result
return wrapper
timed.durations = []
@timed
def sync_requests_get_all(urls):
"""
performs synchronous get requests
"""
# use session to reduce network overhead
session = requests.Session()
return [session.get(url).json() for url in urls]
@timed
def async_requests_get_all(urls):
"""
asynchronous wrapper around synchronous requests
"""
session = requests.Session()
# wrap requests.get into an async function
def get(url):
return session.get(url).json()
async_get = sync.sync_to_async(get)
async def get_all(urls):
return await asyncio.gather(*[
async_get(url) for url in urls
])
# call get_all as a sync function to be used in a sync context
return sync.async_to_sync(get_all)(urls)
@timed
def async_aiohttp_get_all(urls):
"""
performs asynchronous get requests
"""
async def get_all(urls):
async with aiohttp.ClientSession() as session:
async def fetch(url):
async with session.get(url) as response:
return await response.json()
return await asyncio.gather(*[
fetch(url) for url in urls
])
# call get_all as a sync function to be used in a sync context
return sync.async_to_sync(get_all)(urls)
if __name__ == '__main__':
# this endpoint takes ~3 seconds to respond,
# so a purely synchronous implementation should take
# little more than 30 seconds and a purely asynchronous
# implementation should take little more than 3 seconds.
urls = ['https://postman-echo.com/delay/3']*10
async_aiohttp_get_all(urls)
async_requests_get_all(urls)
sync_requests_get_all(urls)
print('----------------------')
[print(duration) for duration in timed.durations]
On my machine, this is the output:
async_aiohttp_get_all started
async_aiohttp_get_all finished in 3.20 seconds
async_requests_get_all started
async_requests_get_all finished in 30.61 seconds
sync_requests_get_all started
sync_requests_get_all finished in 30.59 seconds
----------------------
async_aiohttp_get_all finished in 3.20 seconds
async_requests_get_all finished in 30.61 seconds
sync_requests_get_all finished in 30.59 seconds
Solution 5
maybe requests-futures is another choice.
from requests_futures.sessions import FuturesSession
session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)
It is also recommended in the office document. If you don't want involve gevent, it's a good one.
Related videos on Youtube
trbck
Updated on August 01, 2022Comments
-
trbck almost 2 years
I tried the sample provided within the documentation of the requests library for python.
With
async.map(rs)
, I get the response codes, but I want to get the content of each page requested. This, for example, does not work:out = async.map(rs) print out[0].content
-
Mariusz Jamro over 12 yearsMaybe the responses you're getting have empty body?
-
Chewie over 12 yearsWorks for me. Please post the full error you're getting.
-
trbck over 12 yearsthere is no error. it just runs forever by the provided test urls.
-
trbck over 12 yearsit obviously appears when I use urls over https. http is working just fine
-
OrangeDog almost 6 yearsLooks like
requests-threads
exists now. -
guettli almost 3 yearsMost answers are outdated. In the year 2021 the current bandwagon-effect winner is: docs.aiohttp.org/en/stable
-
-
synthesizerpatel over 11 yearsWith regards to the limitation on concurrent requests - you can specify a pool size when running the map()/imap(). i.e. grequests.map(rs, size=20) to have 20 concurrent grabs.
-
outforawhile about 11 yearsNice idea to have left your comment : due to compatibility issues between latest requests and grequests (lack of max_retries option in requests 1.1.0) i had to downgrade requests to retrieve async and I have found that the asynchronous functionality was moved with versions 0.13+ (pypi.python.org/pypi/requests)
-
Jeff about 11 yearsThanks for pointing that out! Will edit my answer to reflect this info.
-
Nick about 11 yearsthis is "normal" requests in threads. is not bad example buy is off-topic.
-
Pedro almost 10 yearsDumb question: What is the speed increase of using grequests as opposed to simply requests? What limits are there regarding requests? e.g. would putting 3500 requests in async.map be OK?
-
saarp almost 10 yearsAs of now this is not python3-capable (gevent fails to build v2.6 on py3.4).
-
Allan Ruin over 9 yearsI not quite understand the async part. if I let
results = grequests.map(rs)
the the code after this line is block, I can see the async effect? -
Allan Ruin over 9 years
from grequests import async
do not work.. and this definition of dosomething work for medef do_something(response, **kwargs):
, I find it from stackoverflow.com/questions/15594015/… -
Monkey Boson about 9 years@YSY Feel free to post an issue: github.com/ctheiss/simple-requests/issues; I literally use this library thousands of times a day.
-
YSY about 9 yearsBoston, how do you handle 404/500 errors? what about https urls? will appreciate a snipping that supports thousands of urls. can you please paste an example? thanks
-
Monkey Boson about 9 years@YSY By default 404/500 errors raise an exception. This behaviour can be overridden (see pythonhosted.org/simple-requests/…). HTTPS urls are tricky due to the reliance on gevent, which currently has an outstanding bug on this (github.com/gevent/gevent/issues/477). There is a shim in the ticket you can run, but it will still throw warnings for SNI servers (but it will work). As for snipping, I'm afraid all my usages are at my company and closed. But I assure you we execute thousands of requests over tens of jobs.
-
bryanph almost 9 yearsif the async.map call still blocks, then how is this asynchronous? Besides the requests themselves being sent asynchronously, the retrieval is still synchronous?
-
Jose Cherian over 8 yearsOne of the easiest solutions. Number of concurrent requests can be increased by defining max_workers parameter
-
Slow Harry over 8 yearsWhat type of exception is possible here?
-
Hodza over 8 yearsrequests.exceptions.Timeout
-
Slow Harry over 8 yearsThanks, but I can't understand why you create a timeout for request?
-
Hodza over 8 yearsIn my company we handle thouthands of urls per second and some sites can block thread for too long. So timouts increase our overall speed. Also we use max request size. stackoverflow.com/questions/23514256/…
-
Slow Harry over 8 yearsThanks a lot, for open answer, could you also explain me, is there motivation to handle one request with multiple threads, or it is impractical ?
-
Hodza over 8 yearsSorry I dont understand your question. Use only single url in multiple threads? Only one case DDoS attacks ))
-
user3125280 about 8 yearsthe downvote was accidental and now its locked! sorry, this was actually a very useful answer for me
-
Martin Thoma about 8 yearsReplacing
from requests import async
byimport grequests as async
worked for me. -
Ja8zyjits almost 8 yearsi second @AllanRuin, may be Jeff should add that line in the answer, that shall help future users.
-
user almost 7 years@user3125280, I've edited the answer, feel free to recast your vote.
-
GrayedFox almost 7 yearsThanks a bunch for this!
-
Macumbaomuerte almost 7 yearsusing grequests on python3 on windows, without the **kwargs @AllanRuin mentioned won't work, thanks for pointing out, i was gonna smash my computer before i found this
-
user1717828 over 6 yearsIt'd be nice to see an example of this scaled so we're not using one variable name per item to loop over.
-
Bunyk over 6 yearsOh, thank you very much for "The asynchronous functionality was moved to grequests after this question was written". I was looking for it everywhere in docs, and was not able to found. :)
-
Corneliu Maftuleac over 6 yearshaving one thread per request is a hell waste of resources! it is not possible to do for example 500 requests simultaneously, it will kill your cpu. this should never be considered a good solution.
-
Dreampuf about 6 years@CorneliuMaftuleac good point. Regarding the thread usage, you definitely need to care about it and the library provide an option to enable the threading pool or processing pool.
ThreadPoolExecutor(max_workers=10)
-
Corneliu Maftuleac about 6 years@Dreampuf processing pool I believe is even worse?
-
Dreampuf about 6 years@CorneliuMaftuleac I have to say it depends on your case. It has always been a trade-off about I/O bound or CPU bound.
-
Jon about 6 yearsThis feels like it was copied and pasted from the documentation page. As @user1717828, it'd be more beneficial to see a scaled up example.
-
OrangeDog almost 6 years
grequests
now recommendsrequests-threads
orrequests-futures
-
nagylzs about 5 yearsI don't understand why this answer got so many upvotes. The OP question was about async requests. ThreadPoolExecutor runs threads. Yes, you can make requests in multiple threads, but that will never be an async program, so I how could it be an answer for the original question?
-
Hodza about 5 yearsActually, the question was about how to load URLs in parallel. And yes thread pool executor is not the best option, it is better to use async io, but it works well in Python. And I don't understand why threads couldn't be used for async? What if you need to run CPU bound task asynchronously?
-
nurettin almost 5 yearsconfirmed, works great. On the project page it says this work has been overtaken by the following project github.com/encode/httpx
-
Isaac Philip over 4 yearsLibrary looks sleek with respect to interaction. Is Python3+ usable? Sorry could not see any mention.
-
Coder Nr 23 about 4 yearsIf I understand correctly, this will block the event loop while doing the GRPC and HTTP call? So if these calls take seconds to complete, your entire event loop will block for seconds? To avoid this, you need to use GRPC or HTTP libraries that are
async
. Then you can for example doawait response = requests.get(URL)
. No? -
Monkey Boson about 4 years@Jethro absolutely right, the library would need a total re-write since the underlying technologies are quite different in Python 3. For right now, the library is "complete" but only works for Python 2.
-
DragonBobZ almost 4 yearsUnfortunately, when trying this out, I found that making a wrapper around
requests
is barely faster (and in some cases slower) than just calling a list of URLs synchronously. E.g, requesting an endpoint that takes 3 seconds to respond 10 times using the strategy above takes about 30 seconds. If you want trueasync
performance, you need to use something likeaiohttp
. -
rlat almost 4 yearsAnd
respx
for mocking/testing :) -
Amir Hossein Baghernezad almost 4 yearsTwisted is old fashioned. Use HTTPX instead.
-
arshbot almost 4 years@DragonBobZ In my case I saw a ~40% reduction in time. The main benefit was to be able to perform necessary chores while waiting for the next call. In my dataset I was making hundreds of calls, so scale could also be a factor.
-
arshbot almost 4 years@CoderNr23 Someone can correct me on this, but iirc even with that syntax, executing the tasks is fundamentally synchronous - the order of which tasks when is what's made async. In python, you're merely packaging a synchronous execution tree which will be executed with some starter like
run_until_complete
- this is unless you use the threading module, which delegates async to the OS layer. Read up on the GIL problem in python for more info -
DragonBobZ almost 4 years@arshbot Yes, if your chores are asynchronous, then you will see speed-ups, despite waiting on synchronous calls to
requests.get
. But the question is how to perform asynchronous requests with the pythonrequests
library. This answer does not do that, so my criticism stands. -
iedmrc over 3 yearsI prefer using the event loop rather than thread pools.
-
iedmrc over 3 yearsI think this should be bumped. Using async event loops seems enough to fire asynchronous requests. No need to install external dependencies.
-
AJ. over 3 yearsHi @Uri, I am getting below error in trying the code you mentioned in this answer.
await asyncio.gather(*map(get_async, urls)) ^ SyntaxError: invalid syntax
Please guide -
Uri over 3 yearsNote that you need an async context to use
await
. -
rtaft over 3 yearsThis works but it generates a new thread for each request, which seems to defeat the purpose of using asyncio.
-
vaskrneup over 3 years@rtaft Thank you for the suggestion, I have corrected my words.
-
Sergio Chumacero about 3 years@iedmrc sadly, this is not the case. For a task to be non-blocking it has to be implemented using the newer async tools in Python, and this is not the case with the requests library. If you just use stick requests tasks in an async event loop, those would still be blocking. That being said, you can (as suggested in other responses) use things like gevent or threads with requests, but certainly not asyncio.
-
theberzi almost 3 yearsOn the github, repo, the author of grequests recommends using requests-threads or requests-futures instead.
-
guettli almost 3 years"asnyc" is this a typo, or on purpose?
-
DragonBobZ almost 3 yearsdefinitely a typo
-
wescpy over 2 yearsYour
async_aiohttp_get_all()
is a nice solution. I came up with something similar, but had an extraasync def fetch_all(urls): return await asyncio.gather(*[fetch(url) for url in urls])
outside of it, which had my solution creating separateaiohttp.ClientSession()
instances for each URL whereas by embedding a local function, you're able to reuse the same session... much more Pythonic IMO. Can you remind me of the benefit of usingsync.async_to_sync()
with the existence ofget_all()
vs.asyncio.run()
withoutget_all()
? -
Tuhin Mitra over 2 yearsawesomely done, definitely async_aiohttp working better than all!