Python 3 urllib Vs requests performance

18,278

First of all, to reproduce the problem, I had to add the following line to your onStringSend function:

request.get_data()

Otherwise, I was getting “connection reset by peer” errors because the server’s receive buffer kept filling up.

Now, the immediate reason for this problem is that Response.content (which is called implicitly when stream=False) iterates over the response data in chunks of 10240 bytes:

self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()

Therefore, the easiest way to solve the problem is to use stream=True, thus telling Requests that you will be reading the data at your own pace:

response_data = s.post(url=url, data=data, stream=True, verify=False).raw.read()

With this change, the performance of the Requests version becomes more or less the same as that of the urllib version.

Please also see the “Raw Response Content” section in the Requests docs for useful advice.

Now, the interesting question remains: why is Response.content iterating in such small chunks? After talking to Cory Benfield, a core developer of Requests, it looks like there may be no particular reason. I filed issue #3186 in Requests to look further into this.

Share:
18,278
user1470957
Author by

user1470957

Updated on June 04, 2022

Comments

  • user1470957
    user1470957 almost 2 years

    I'm using python 3.5 and I'm checking the performance of urllib module Vs requests module. I wrote two clients in python the first one is using the urllib module and the second one is using the request module. they both generate a binary data, which I send to a server which is based on flask and from the flask server I also return a binary data to the client. I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request. I'm working on localhost.
    my question is why?
    what I'm doing wrong with request module which make it to be slower?

    this is the server code :

    from flask import Flask, request
    app = Flask(__name__)
    from timeit import default_timer as timer
    import os
    
    @app.route('/onStringSend', methods=['GET', 'POST'])
    def onStringSend():
        return data
    
    if __name__ == '__main__':
        data_size = int(1e7)
        data = os.urandom(data_size)    
        app.run(host="0.0.0.0", port=8080)
    

    this is the client code based on urllib :

    import urllib.request as urllib2
    import urllib.parse
    from timeit import default_timer as timer
    import os
    
    data_size = int(1e7)
    num_of_runs = 20
    url = 'http://127.0.0.1:8080/onStringSend'
    
    def send_binary_data():
        data = os.urandom(data_size)
        headers = {'User-Agent': 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;  Windows NT)', 'Content-Length': '%d' % len(data), 'Content-Type':  'application/octet-stream'}
        req = urllib2.Request(url, data, headers)
        round_trip_time_msec = [0] * num_of_runs
        for i in range(0,num_of_runs):
            t1 = timer()
            resp = urllib.request.urlopen(req)
            response_data = resp.read()
            t2 = timer()
            round_trip_time_msec[i] = (t2 - t1) * 1000
    
        t_max = max(round_trip_time_msec)
        t_min = min(round_trip_time_msec)
        t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)
    
        print('max round trip time [msec]: ', t_max)
        print('min round trip time [msec]: ', t_min)
        print('average round trip time [msec]: ', t_average)
    
    
    send_binary_data()
    

    this is the client code based on requests :

    import requests
    import os
    from timeit import default_timer as timer
    
    
    url = 'http://127.0.0.1:8080/onStringSend'
    data_size = int(1e7)
    num_of_runs = 20
    
    
    def send_binary_data():
        data = os.urandom(data_size)
        s = requests.Session()
        s.headers['User-Agent'] = 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;Windows NT)'
        s.headers['Content-Type'] = 'application/octet-stream'
        s.headers['Content-Length'] = '%d' % len(data)
    
        round_trip_time_msec = [0] * num_of_runs
        for i in range(0,num_of_runs):
            t1 = timer()
            response_data = s.post(url=url, data=data, stream=False, verify=False)
            t2 = timer()
            round_trip_time_msec[i] = (t2 - t1) * 1000
    
        t_max = max(round_trip_time_msec)
        t_min = min(round_trip_time_msec)
        t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)
    
        print('max round trip time [msec]: ', t_max)
        print('min round trip time [msec]: ', t_min)
        print('average round trip time [msec]: ', t_average)
    
    send_binary_data()
    

    thanks very much

  • jdhao
    jdhao almost 4 years
    The link to the talk is broken now.