Create and parse multipart HTTP requests in Python

13,081

Solution 1

After a bit of exploration, the answer to this question has become clear. The short answer is that although the Content-Disposition is optional in a Mime-encoded message, web.py requires it for each mime-part in order to correctly parse out the HTTP request.

Contrary to other comments on this question, the difference between HTTP and Email is irrelevant, as they are simply transport mechanisms for the Mime message and nothing more. Multipart/related (not multipart/form-data) messages are common in content exchanging webservices, which is the use case here. The code snippets provided are accurate, though, and led me to a slightly briefer solution to the problem.

# open an HTTP connection
h1 = httplib.HTTPConnection('localhost:8080')

# create a mime multipart message of type multipart/related
msg = MIMEMultipart("related")

# create a mime-part containing a zip file, with a Content-Disposition header
# on the section
fp = open('file.zip', 'rb')
base = MIMEBase("application", "zip")
base['Content-Disposition'] = 'file; name="package"; filename="file.zip"'
base.set_payload(fp.read())
encoders.encode_base64(base)
msg.attach(base)

# Here's a rubbish bit: chomp through the header rows, until hitting a newline on
# its own, and read each string on the way as an HTTP header, and reading the rest
# of the message into a new variable
header_mode = True
headers = {}
body = []
for line in msg.as_string().splitlines(True):
    if line == "\n" and header_mode == True:
        header_mode = False
    if header_mode:
        (key, value) = line.split(":", 1)
        headers[key.strip()] = value.strip()
    else:
        body.append(line)
body = "".join(body)

# do the request, with the separated headers and body
h1.request("POST", "http://localhost:8080/server", body, headers)

This is picked up perfectly well by web.py, so it's clear that email.mime.multipart is suitable for creating Mime messages to be transported by HTTP, with the exception of its header handling.

My other overall conern is in scalability. Neither this solution nor the others proposed here scale well, as they read the contents of a file into a variable before bundling up in the mime message. A better solution would be one which could serialise on demand as the content is piped out over the HTTP connection. It's not urgent for me to fix that, but I'll come back here with a solution if I get to it.

Solution 2

I used this package by Will Holcomb http://pypi.python.org/pypi/MultipartPostHandler/0.1.0 to make multi-part requests with urllib2, it may help you out.

Share:
13,081
Richard J
Author by

Richard J

Founder at Cottage Labs

Updated on July 26, 2022

Comments

  • Richard J
    Richard J over 1 year

    I'm trying to write some python code which can create multipart mime http requests in the client, and then appropriately interpret then on the server. I have, I think, partially succeeded on the client end with this:

    from email.mime.multipart import MIMEMultipart, MIMEBase
    import httplib
    h1 = httplib.HTTPConnection('localhost:8080')
    msg = MIMEMultipart()
    fp = open('myfile.zip', 'rb')
    base = MIMEBase("application", "octet-stream")
    base.set_payload(fp.read())
    msg.attach(base)
    h1.request("POST", "http://localhost:8080/server", msg.as_string())
    

    The only problem with this is that the email library also includes the Content-Type and MIME-Version headers, and I'm not sure how they're going to be related to the HTTP headers included by httplib:

    Content-Type: multipart/mixed; boundary="===============2050792481=="
    MIME-Version: 1.0
    
    --===============2050792481==
    Content-Type: application/octet-stream
    MIME-Version: 1.0
    

    This may be the reason that when this request is received by my web.py application, I just get an error message. The web.py POST handler:

    class MultipartServer:
        def POST(self, collection):
            print web.input()
    

    Throws this error:

    Traceback (most recent call last):
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 242, in process
        return self.handle()
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 233, in handle
        return self._delegate(fn, self.fvars, args)
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 415, in _delegate
        return handle_class(cls)
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 390, in handle_class
        return tocall(*args)
      File "/home/richard/Development/server/webservice.py", line 31, in POST
        print web.input()
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/webapi.py", line 279, in input
        return storify(out, *requireds, **defaults)
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 150, in storify
        value = getvalue(value)
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 139, in getvalue
        return unicodify(x)
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 130, in unicodify
        if _unicode and isinstance(s, str): return safeunicode(s)
      File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 326, in safeunicode
        return obj.decode(encoding)
      File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode bytes in position 137-138: invalid data
    

    My line of code is represented by the error line about half way down:

      File "/home/richard/Development/server/webservice.py", line 31, in POST
        print web.input()
    

    It's coming along, but I'm not sure where to go from here. Is this a problem with my client code, or a limitation of web.py (perhaps it just can't support multipart requests)? Any hints or suggestions of alternative code libraries would be gratefully received.

    EDIT

    The error above was caused by the data not being automatically base64 encoded. Adding

    encoders.encode_base64(base)
    

    Gets rid of this error, and now the problem is clear. HTTP request isn't being interpreted correctly in the server, presumably because the email library is including what should be the HTTP headers in the body instead:

    <Storage {'Content-Type: multipart/mixed': u'', 
              ' boundary': u'"===============1342637378=="\n'
              'MIME-Version: 1.0\n\n--===============1342637378==\n'
              'Content-Type: application/octet-stream\n'
              'MIME-Version: 1.0\n' 
              'Content-Transfer-Encoding: base64\n'
              '\n0fINCs PBk1jAAAAAAAAA.... etc
    

    So something is not right there.

    Thanks

    Richard

  • Richard J
    Richard J over 13 years
    Great, thanks, I'll take a look at that. Certainly has the right sort of name :) Cheers, R
  • Richard J
    Richard J over 13 years
    thanks - as you can see, I'm still working on this; I haven't figured out where multipart/mixed is being set yet. Likewise I haven't yet employed the Content-Disposition header as I'm still working on getting it into an HTTP request in the first place. My question is about how to construct such a request in the first place. Cheers, R.
  • Martin v. Löwis
    Martin v. Löwis over 13 years
    See the recipe. Forget about email.mime - HTTP is not email.
  • Richard J
    Richard J over 13 years
    Hi Martin; FYI I've demonstrated that the difference between HTTP and email is irrelevant here - they are simply transports, and mime is the same in either case. See my alternative answer. Thanks for the pointers. R
  • tc.
    tc. about 12 years
    1. I think the preferred way to set the header is something like base.add_header('Content-Disposition','file',name='package',‌​...). 2. It's better to search for \n\n (and also \r\n\r\n, with e.g. re.search('\r?\n\r?\n',...)) so you don't have to split and join the body. 3. Header lines can be folded. 4. Technically the \n that terminates the headers is not part of the body, though this isn't harmful. 5. I'm not entirely sure that the RFC 5322 and RFC 2316 grammars are 100% compatible (in particular WRT "characters" vs octets).
  • fiatjaf
    fiatjaf about 9 years
    That's wrong, multipart/mixed is used in HTTP: docs.couchdb.org/en/latest/replication/…
  • Lorenzo Persichetti
    Lorenzo Persichetti over 4 years
    even if rare, multipart/mixed is used in python. See Google APIs: developers.google.com/drive/api/v3/performance#details