parse multipart/form-data, received from requests post

31,156

Solution 1

If you're receiving a multipart/form-data response, you can parse it using the requests-toolbelt library like so:

$ pip install requests-toolbelt

After installing it

from requests_toolbelt.multipart import decoder

testEnrollResponse = requests.post(...)
multipart_data = decoder.MultipartDecoder.from_response(testEnrollResponse)

for part in multipart_data.parts:
    print(part.content)  # Alternatively, part.text if you want unicode
    print(part.headers)

Solution 2

Code sample for Flask, uses https://github.com/defnull/multipart

import multipart as mp
from multipart import tob

try:
    from io import BytesIO
except ImportError:
    from StringIO import StringIO as BytesIO

@app.route('/', methods=["GET","POST"])
def index():
        ...
        elif flask.request.method == "POST":
                data = flask.request.data
                s = data.split("\r")[0][2:]
                p = mp.MultipartParser(BytesIO(tob(data)),s)
                blob = p.parts()[0].value
                f = open("file.bin","wb")
                f.write(blob.encode("latin-1"))
                f.close()

Solution 3

A working example of parsing multipart data follows. You can try it out at the interactive python prompt.

import email

msg = email.message_from_string('''\
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="    XXXX"

--    XXXX
Content-Type: text/plain


--    XXXX
Content-Type: text/plain

--    XXXX--
''')

msg.is_multipart()

Once you know its working on your system, you can build your own email message out of the POST data and parse it the same way. If you have the raw post body as a string the rest of the necessary information can be found in the request headers. I added indentation here for clarity, you should not have extraneous indentation in the block string.

    epost_data = '''\
MIME-Version: 1.0
Content-Type: %s

%s''' % (self.headers['content-type'], post_data)

    msg = email.message_from_string(post_data)

    if msg.is_multipart():
        for part in msg.get_payload():
            name = part.get_param('name', header='content-disposition')
            filename = part.get_param('filename', header='content-disposition')
            # print 'name %s' % name # "always" there
            # print 'filename %s' % filename # only there for files...
            payload = part.get_payload(decode=True)
            print payload[:100] # output first 100 characters

The first %s will be replaced with the content type, and the second with post_data. You can then write the payload to a file, etc.

Be careful to consider security implications of saving a file. You may not be able to trust the file name posted, it could start with ../../filename.sh for example on some web servers, so if you try to write /my-folder/../../filename.sh the attacker could potentially place a malicious file outside of the location where you are trying to store files. Strong validation of the file being the allowed type before trusting the file itself is also recommended. You do not want to let attackers overwrite any file on your system.

Share:
31,156

Related videos on Youtube

user3131037
Author by

user3131037

Updated on July 09, 2022

Comments

  • user3131037
    user3131037 almost 2 years

    I am writing Web Service Client, using requests library. I am getting data in multipart/form-data that contains a file and text-json. I have no idea how to parse it. Is there a proper library to parse multipart/form-data format in python or should I write parser on my own?

    my code:

    data = {
      "prototypeModel" :('prototypeModel', open(prototypeModel, 'rb'), 'application/octet-stream', {'Expires': '0'}),
      "mfcc_1" : ('mfcc', open(mfcc_1, 'rb'), 'application/octet-stream', {'Expires': '0'}),
      "mfcc_2" : ('mfcc', open(mfcc_2, 'rb'), 'application/octet-stream', {'Expires': '0'}),
      "mfcc_3" : ('mfcc', open(mfcc_3, 'rb'), 'application/octet-stream', {'Expires': '0'}),
    }
    
    print( '---------------------- start enroll ----------------------')
    testEnrollResponse = requests.post(server+sessionID, files = data, json = declaredParameters)
    

    b'\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc\r\nContent-Disposition: form-data; name="playbackHash"\r\nContent-Type: application/octet-stream\r\n\r\n\x16\x00\x00\x00\x00\x00\x00\x00serialization::archive\n\x00\x04\x08\x04 .... x00\x00R\x94\x9bp\x8c\x00\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc\r\nContent-Disposition: form-data; name="usersMFCC"\r\nContent-Type: application/octet-stream\r\n\r\n\x16\x00\x00\x00\x00\x00\x00\x00serialization::archive\n\x00\x04\x08\x04\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\xf8\x16\x00\x00\x00\x00\x00\x00u\xbd\xb4/\xda1\xea\xbf\x0f\xed\xa2<\xc9\xf8\xe7\xbf?\xd5\xf06u\xe7\xf0\xbf\xd4\x8d\xd4\xa1F\xbe\x03@\x85X!\x19\xd8A\x06@\x8co\xf7\r .....
    x80\xd9\x95Yxn\xd0?\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc\r\nContent-Disposition: form-data; name="scoreAndStatus"\r\nContent-Type: application/json; charset=utf-8\r\n\r\n{"lexLikelihood":1.544479046897232,"overallScore":-nan,"playbackLikelihood":-inf,"status":{"errorCode":0,"errorMessage":""}}\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc--\r\n'

    I replaced more binary data with " ..... "

    • miles82
      miles82 over 8 years
      Show us the response you get.
  • uphill
    uphill over 5 years
    its about a requests client not a flask server
  • Alexander Lubyagin
    Alexander Lubyagin over 5 years
    Yes, you can to adapt my code for web service client. My code does not used python-requests module. And it's working.
  • uphill
    uphill over 5 years
    But its not related to the question.
  • uphill
    uphill over 5 years
    It has correct information, but its not answering the question in any way.
  • Rahul
    Rahul over 2 years
    This has useful information, infact this post showed up when I was trying to find something similar for server side. Even the question is bi-directional, it says receives request, could be from client.
  • étale-cohomology
    étale-cohomology almost 2 years
    This seems indeed useful, but the multipart library doesn't have a tob function
  • Alexander Lubyagin
    Alexander Lubyagin almost 2 years
    See commit: github.com/defnull/multipart/commit/… renamed tob to to_bytes, fixed a syntaxerr typo