Does my code prevent directory traversal?

10,236

Solution 1

Your code does not prevent directory traversal. You can guard against this with the os.path module.

>>> import os.path
>>> os.curdir
'.'
>>> startdir = os.path.abspath(os.curdir)
>>> startdir
'/home/jterrace'

startdir is now an absolute path where you don't want to allow the path to go outside of. Now let's say we get a filename from the user and they give us the malicious /etc/passwd.

>>> filename = "/etc/passwd"
>>> requested_path = os.path.relpath(filename, startdir)
>>> requested_path
'../../etc/passwd'
>>> requested_path = os.path.abspath(requested_path)
>>> requested_path
'/etc/passwd'

We have now converted their path into an absolute path relative to our starting path. Since this wasn't in the starting path, it doesn't have the prefix of our starting path.

>>> os.path.commonprefix([requested_path, startdir])
'/'

You can check for this in your code. If the commonprefix function returns a path that doesn't start with startdir, then the path is invalid and you should not return the contents.


The above can be wrapped to a static method like so:

import os 

def is_directory_traversal(file_name):
    current_directory = os.path.abspath(os.curdir)
    requested_path = os.path.relpath(file_name, start=current_directory)
    requested_path = os.path.abspath(requested_path)
    common_prefix = os.path.commonprefix([requested_path, current_directory])
    return common_prefix != current_directory

Solution 2

Use only the base name of the user inputed file:

file_name = request.path_params["file"]
file_name = os.path.basename(file_name)
file = open(os.path.join("/path", file_name), "rb")

os.path.basename strips ../ from the path:

>>> os.path.basename('../../filename')
'filename'

Solution 3

There's a much simpler solution here:

relative_path = os.path.relpath(path, start=self.test_directory)
has_dir_traversal = relative_path.startswith(os.pardir)

relpath takes care of normalising path for us. And if the relative path starts with .., then you don't allow it.

Share:
10,236

Related videos on Youtube

deamon
Author by

deamon

Updated on June 04, 2022

Comments

  • deamon
    deamon almost 2 years

    Is the following code snippet from a Python WSGI app safe from directory traversal? It reads a file name passed as parameter and returns the named file.

    file_name = request.path_params["file"]
    file = open(file_name, "rb")
    mime_type = mimetypes.guess_type(file_name)[0]
    start_response(status.OK, [('Content-Type', mime_type)])
    return file
    

    I mounted the app under http://localhost:8000/file/{file} and sent requests with the URLs http://localhost:8000/file/../alarm.gif and http://localhost:8000/file/%2e%2e%2falarm.gif. But none of my attempts delivered the (existing) file. So is my code already safe from directory traversal?

    New approach

    It seems like the following code prevents directory traversal:

    file_name = request.path_params["file"]
    absolute_path = os.path.join(self.base_directory, file_name)
    normalized_path = os.path.normpath(absolute_path)
    
    # security check to prevent directory traversal
    if not normalized_path.startswith(self.base_directory):
        raise IOError()
    
    file = open(normalized_path, "rb")
    mime_type = mimetypes.guess_type(normalized_path)[0]
    start_response(status.OK, [('Content-Type', mime_type)])
    return file
    
    • Katriel
      Katriel
      What happens if you give it an absolute path?
  • Graham Dumpleton
    Graham Dumpleton almost 13 years
    Just don't rely on doing things relative to the current working directory as that can be anything in a web application. Should always base it off an absolute path as starting point, be that hard wired or calculated from __file__.
  • jterrace
    jterrace almost 13 years
    @graham-dumpleton This has nothing to do with relative or absolute. A path can always break out unless you do this type of sanity check.
  • Graham Dumpleton
    Graham Dumpleton almost 13 years
    You don't seem to quite get what I am referring to. You gave in your example 'startdir = os.path.abspath(os.curdir)'. That will set 'startdir' to the current working directory. In a Python web application there are no guarantees what the current working directory will be. So, it is a poor example to use because people will blindly cut and paste the code without understanding that they should be anchoring it at an absolute path which has meaning for their application rather than relying on whatever os.getcwd() is going to return when os.path.abspath(os.curdir) is called.
  • deamon
    deamon almost 13 years
    This does not prevent directory traversal since file_name can contain ../! But your code was helpful anyway.
  • deamon
    deamon almost 13 years
    Sorry, your solution seems to work too. But it would be limited to a single directory level (without sub directories). I'll correct my down vote if you modify your answer slightly so that I can vote again).
  • evandrix
    evandrix about 9 years
    You can test this code snippet out against a directory traversal attack fuzzer like dotdotpwn @ github.com/wireghoul/dotdotpwn, eg. './dotdotpwn.pl -m http-url -u "google.com/?q=TRAVERSAL" -O -k "root:"'
  • Clodoaldo Neto
    Clodoaldo Neto about 9 years
    @daemon os.path.basename strips ../ from the path. Check the update answer.
  • Flimm
    Flimm about 8 years
    os.path.basename does strip ../, but it also strips all path related information. os.path.basename("a/b/c") returns "c"
  • Flimm
    Flimm about 8 years
    Note this depends on startdir ending with a forward slash, which it doesn't in this case! This code will incorrectly accept the input ../jterrace-attack/foobar.
  • Clodoaldo Neto
    Clodoaldo Neto about 8 years
    @Flimm: Yes, that is intended.