PHP: How to sanitize uploaded filenames?

14,183

Solution 1

I bet that you also store some information about the file in the database. If this is correct, then you can use the primary key (ID) as a filename on your server and preserve the original filename in the database. This gives you greater flexibility, because you can manipulate the metadata without renaming the actual file.

Solution 2

I would just run a simple regex that replaces any non alphanumeric characters with an underscore (or just remove these character altogether). Make sure you preserve the extension of course.

If you want to go a bit further, you could use magic mime extension to ensure the file is the same format that the extension says it is.

EDIT: To avoid filename collisions in a directory, you could append a md5 of users IP + current time to the filename.

Solution 3

To avoid filename collision just check whether given or generated filename doesn't already exists:

do {
   // Generate filename, eg.:
   $filename = md5(uniqid()) . $fileExtension;
} while (file_exists($filename));

That gives you 100% sure that the filename is unique. Using md5 (or any other hash algorithm) ensures you that the filename is secure - and easy to handle.

Solution 4

Ciao, this function also removes all the points and then I create the clean string with the extension.

function sanitaze_upload_file($data)
{
    $imgName   = $data;
    $indexOFF  = strrpos($imgName, '.');
    $nameFile  = substr($imgName, 0,$indexOFF);
    $extension = substr($imgName, $indexOFF);
    $clean     = preg_replace("([^\w\s\d\-_~,;\[\]\(\)])", "", 
    $nameFile);
    $NAMEFILE  = str_replace(' ', '', $clean).$extension;
    return $NAMEFILE;
}
Share:
14,183
frooyo
Author by

frooyo

Updated on July 25, 2022

Comments

  • frooyo
    frooyo almost 2 years

    I have a PHP application.

    I allow users to upload files to my web application.

    Question: What's the best way for me to sanitize the file names of the uploaded documents $_FILES["filename"]["tmp_name"] in PHP?

    UPDATE:

    Can I take an MD5 of the uploaded filename and use that as the newly assigned filename? If so, how do I do that in PHP?

  • frooyo
    frooyo over 13 years
    What happens if they upload 2 documents at the same time?
  • frooyo
    frooyo over 13 years
    Do you mean: hash('md5', $_FILES["filename"]["tmp_name"] ) ?
  • Sam Day
    Sam Day over 13 years
    Add some additional entropy, use a counter for each file you process:
  • jduren
    jduren over 13 years
    Yes, the first argument you pass is the type of hash you would like to create and the second argument is the string you want to create the hash from. PHP Hash - php.net/manual/en/function.hash.php
  • Sam Day
    Sam Day over 13 years
    If $i is incremented for each file you process: $filename = $sanitizedFileName . md5($_SERVER["REMOTE_ADDR"] . time() . $i) . $extension;
  • jduren
    jduren over 13 years
    I also would suggest adding Sam Days practice as well where you could add the current time to the filename before hashing, that would create an even more unique filename.
  • Wrikken
    Wrikken over 13 years
    If you're going the 'just have a (pseudo)random filename'-route, using the tempnam()-function will automatically solve race-conditions.
  • Ben Dunlap
    Ben Dunlap over 13 years
    You'd definitely need to add something before hashing. Hashing the filename alone won't prevent name collisions, because the same filename will always produce the same hash.
  • frooyo
    frooyo over 13 years
    @jduren, please update your anser to be: hash('md5', $_FILES["filename"]["tmp_name"] ). If you update the answer, I'll mark is a "accepted"
  • jduren
    jduren over 13 years
    @Wrikken tempnam() actually creates a new file altogether doesn't it? Using a hashed unique string then just renaming the file would be more ideal.
  • frooyo
    frooyo over 13 years
    No, that's the tmp filename PHP uses to when a user upload a file to the server.
  • jduren
    jduren over 13 years
    @user401839 No, Wrikken mentioned a tempnam() function in his comment (4th comment down on my answer) and I am wondering in terms of performance, is using tempnam() which creates a new file altogether better versus just renaming the file with a hash of the original filename with the time or some other unique data appended.
  • Wrikken
    Wrikken over 13 years
    @jduren: performance difference is negligible, avoiding race conditions & filename-collisions isn't, and before you've got a solid implementation of those tempnam() should be already done.
  • frooyo
    frooyo over 13 years
    This makes no sense. You're taking the MD5 of the uniqid function. Why?
  • Crozin
    Crozin over 13 years
    That's only example. You can use original filename etc. The point is that you should check in loop whether generated filename is already in use. If so regenerate filename.
  • jduren
    jduren over 13 years
    @Wikkan any chance you can provide a useage example where you would use tempnam() to deal with this question for instance. I read a little off the PHP Man page for the tempnam() function and just don't see how tempnam is useful in this instance. As far as I can tell it creates a a temporary file with the option to set a prefix. But it creates it with the .tmp extension then returns the filename? Creating an entire new file just to get a unique string seems a bit overkill but I may be interpreting it incorrectly. Example?
  • Wrikken
    Wrikken over 13 years
    (1) You will actually use the file as the new file, the target for the upload (that the file isn't temporary as tempnam will have you think doesn't matter, and don't let that throw you of) (2) No single upload every overwrites an existing one, even if some_hash(filename) is equal (3) No 2 simultaneous upload will ever claim the same filename, even though they are simultaneous and result in the same hash. And yes, the chance for (2) or (3) to occur are slim, however, given a combination of enough visitors & time, it will happen someday, and then you're glad you coded to make it no problem.
  • Andras Nemeth
    Andras Nemeth about 11 years
    Taking md5() of uniqid() makes no sense. Taking md5() of the original filename (i.e. a constant value) in a loop makes even less sense. :-)
  • hijarian
    hijarian over 10 years
    Of course, taking md5 out of uniqid does makes sense! uniqid is very long string containing dashes, md5 converts it to the sequence of alphanumeric characters, which is a lot more pleasing to see. Even taking md5 out of time() . $filename makes a lot of sense - you're generating the alphanumeric filenames guaranteed to be unique!
  • hijarian
    hijarian over 10 years
    @Crozin Sorry for necroposting, but why not md5(time()), but md5(uniqid())? I understand it's maybe a matter of taste but anyway?
  • Crozin
    Crozin over 10 years
    time() has 1 second resolution, while uniqid() (which uses microtime()) has 1 milisecond resolutioin. This means that if you'd use time() in case of collision your loop would do the same task for 1 second - it would be pointless.
  • wyz1
    wyz1 about 5 years
    Just reading through this thread i'v become smarter. I happen to not pay attention to such finer details as noticing the time performance difference between uniqid() and time(). Next time instead of just going with the highest rated answer I'll read though all answers and following comments.
  • wyz1
    wyz1 about 5 years
    What happens when the user needs to upload multiple files?
  • wyz1
    wyz1 about 5 years
    How much tests has this function been through? Can i simply copy and paste this in my code?
  • Thibault Witzig
    Thibault Witzig about 5 years
    @wyz1 You may have misunderstood Crozin's comment (or I have misunderstood yours :p). The difference between time() and uniqid() is not a matter of performance but a matter of precision. With time(), you have duplicates if two files are uploaded in the same second. With uniqid(), you have duplicates only if two files are uploaded in the same 1/1000000th of a second. Besides, uniqid() has an optional parameter to add more entropy to the result and make it even safer.
  • xorinzor
    xorinzor almost 2 years
    @wyz1 you should never just copy-paste code if you don't understand what it does