Make an http POST request to upload a file using Python urllib/urllib2

52,902

Solution 1

After some digging around, it seems this post solved my problem. It turns out I need to have the multipart encoder setup properly.

from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2

register_openers()

with open("style.css", 'r') as f:
    datagen, headers = multipart_encode({"file": f})
    request = urllib2.Request("http://jigsaw.w3.org/css-validator/validator", \
                              datagen, headers)
    response = urllib2.urlopen(request)

Solution 2

Personally I think you should consider the requests library to post files.

url = 'http://jigsaw.w3.org/css-validator/validator'
files = {'file': open('style.css')}
response = requests.post(url, files=files)

Uploading files using urllib2 is not impossible but quite a complicated task: http://pymotw.com/2/urllib2/#uploading-files

Solution 3

Well, there are multiple ways to do it. As mentioned above, you can send the file in "multipart/form-data". However, the target service may not be expecting this type, in which case you may try some more approaches.

Pass the file object

urllib2 can accept a file object as data. When you pass this type, the library reads the file as a binary stream and sends it out. However, it will not set the proper Content-Type header. Moreover, if the Content-Length header is missing, then it will try to access the len property of the object, which doesn't exist for the files. That said, you must provide both the Content-Type and the Content-Length headers to have the method working:

import os
import urllib2

filename = '/var/tmp/myfile.zip'
headers = {
    'Content-Type': 'application/zip',
    'Content-Length': os.stat(filename).st_size,
}
request = urllib2.Request('http://localhost', open(filename, 'rb'),
                          headers=headers)
response = urllib2.urlopen(request)

Wrap the file object

To not deal with the length, you may create a simple wrapper object. With just a little change you can adapt it to get the content from a string if you have the file loaded in memory.

class BinaryFileObject:
  """Simple wrapper for a binary file for urllib2."""

  def __init__(self, filename):
    self.__size = int(os.stat(filename).st_size)
    self.__f = open(filename, 'rb')

  def read(self, blocksize):
    return self.__f.read(blocksize)

  def __len__(self):
    return self.__size

Encode the content as base64

Another way is encoding the data via base64.b64encode and providing Content-Transfer-Type: base64 header. However, this method requires support on the server side. Depending on the implementation, the service can either accept the file and store it incorrectly, or return HTTP 400. E.g. the GitHub API won't throw an error, but the uploaded file will be corrupted.

Share:
52,902
Ying Xiong
Author by

Ying Xiong

Hello, world! Welcome to my homepage! My name is Ying Xiong, or 熊英 in Chinese. I graduated from Tsinghua University with a bachelor's degree in July 2010, and from Harvard University with a PhD in November 2015. While at school, I did several internships at University of Southern California (2009), NEC Laboratories America (2011), Google Inc. (2012), and Apple Inc. (2014). After graduation, I worked at Dropbox Inc. from 2015 to 2017 and Google Inc. from 2017 to 2021. Currently, I am a Staff Software Engineer at Databricks Inc.

Updated on July 09, 2022

Comments

  • Ying Xiong
    Ying Xiong almost 2 years

    I would like to make a POST request to upload a file to a web service (and get response) using Python. For example, I can do the following POST request with curl:

    curl -F "[email protected]" -F output=json http://jigsaw.w3.org/css-validator/validator
    

    How can I make the same request with python urllib/urllib2? The closest I got so far is the following:

    with open("style.css", 'r') as f:
        content = f.read()
    post_data = {"file": content, "output": "json"}
    request = urllib2.Request("http://jigsaw.w3.org/css-validator/validator", \
                              data=urllib.urlencode(post_data))
    response = urllib2.urlopen(request)
    

    I got a HTTP Error 500 from the code above. But since my curl command succeeds, it must be something wrong with my python request?

    I am quite new to this topic and my question may have very simple answers or mistakes.

  • Ying Xiong
    Ying Xiong over 9 years
    Thanks, @Wolph. I just tried requests library, but still got an HTTP 500 error.. So my question should probably be rephrased as, what are the differences between the request we made in python and that made by curl? Thanks.
  • Wolph
    Wolph over 9 years
    Well, you have the output=json in your curl request, that's not in the Python request so that's probably the difference. Glad you have it working now though :)
  • Vladius
    Vladius about 9 years
    Don't forget to close the style.css file?
  • nandhp
    nandhp about 9 years
    @Vladius The file will be closed automatically because it is used as a context manager. See documentation on the with statement.
  • Omar
    Omar over 7 years
    Im really new to python. I ran the above seemingly successful. What should i expect now? Where can i verify it works.
  • Santhosh
    Santhosh about 6 years
    I have raw .jpg file in the form of ndarray variable. how can I POST this in a similar way?
  • Wolph
    Wolph about 6 years
    @Santhosh using ndarray.tobytes() is probably easiest, but you can also use a fh = StringIO(); ndarray.tofile(fh) and use the fh as a file object
  • Santhosh
    Santhosh about 6 years
    @Wolph In case I have just the image in the form of ndarray without any format like .jpg or .png how would I upload it, then?
  • Wolph
    Wolph about 6 years
    @Santhosh the raw image should be in .bmp (bitmap) format. That quickly gets bandwidth intensive though, using a temporary .png file might be better
  • real4x
    real4x almost 6 years
    The author has clearly identified that the file needs to be submitted via urllib/urllib2. The requests library doesn't exist in Python 2.7. And installing it externally is not an option in many cases.
  • Wolph
    Wolph almost 6 years
    @real4x: the author is not the only person reading this question and it's answers. For most people using requests is the better option which is why I gave this answer and example of how to use it. Additionally, I linked to the complicated code to make it work
  • Íhor Mé
    Íhor Mé almost 4 years
    One other thing to note is base64 means +33.3% traffic. Especially if you're using some cloud hosting, it's going to cost quite a bit.
  • Viet Than
    Viet Than about 2 years
    poster is for py2, poster3 is supposed to be working for py3 but development has stopped since 2018 and it's not working anymore. see discussion here