PHP: How to sanitize uploaded filenames?
Solution 1
I bet that you also store some information about the file in the database. If this is correct, then you can use the primary key (ID) as a filename on your server and preserve the original filename in the database. This gives you greater flexibility, because you can manipulate the metadata without renaming the actual file.
Solution 2
I would just run a simple regex that replaces any non alphanumeric characters with an underscore (or just remove these character altogether). Make sure you preserve the extension of course.
If you want to go a bit further, you could use magic mime extension to ensure the file is the same format that the extension says it is.
EDIT: To avoid filename collisions in a directory, you could append a md5 of users IP + current time to the filename.
Solution 3
To avoid filename collision just check whether given or generated filename doesn't already exists:
do {
// Generate filename, eg.:
$filename = md5(uniqid()) . $fileExtension;
} while (file_exists($filename));
That gives you 100% sure that the filename is unique. Using md5 (or any other hash algorithm) ensures you that the filename is secure - and easy to handle.
Solution 4
Ciao, this function also removes all the points and then I create the clean string with the extension.
function sanitaze_upload_file($data)
{
$imgName = $data;
$indexOFF = strrpos($imgName, '.');
$nameFile = substr($imgName, 0,$indexOFF);
$extension = substr($imgName, $indexOFF);
$clean = preg_replace("([^\w\s\d\-_~,;\[\]\(\)])", "",
$nameFile);
$NAMEFILE = str_replace(' ', '', $clean).$extension;
return $NAMEFILE;
}
frooyo
Updated on July 25, 2022Comments
-
frooyo almost 2 years
I have a PHP application.
I allow users to upload files to my web application.
Question: What's the best way for me to sanitize the file names of the uploaded documents
$_FILES["filename"]["tmp_name"]
in PHP?UPDATE:
Can I take an MD5 of the uploaded filename and use that as the newly assigned filename? If so, how do I do that in PHP?
-
frooyo over 13 yearsWhat happens if they upload 2 documents at the same time?
-
frooyo over 13 yearsDo you mean: hash('md5', $_FILES["filename"]["tmp_name"] ) ?
-
Sam Day over 13 yearsAdd some additional entropy, use a counter for each file you process:
-
jduren over 13 yearsYes, the first argument you pass is the type of hash you would like to create and the second argument is the string you want to create the hash from. PHP Hash - php.net/manual/en/function.hash.php
-
Sam Day over 13 yearsIf $i is incremented for each file you process:
$filename = $sanitizedFileName . md5($_SERVER["REMOTE_ADDR"] . time() . $i) . $extension;
-
jduren over 13 yearsI also would suggest adding Sam Days practice as well where you could add the current time to the filename before hashing, that would create an even more unique filename.
-
Wrikken over 13 yearsIf you're going the 'just have a (pseudo)random filename'-route, using the
tempnam()
-function will automatically solve race-conditions. -
Ben Dunlap over 13 yearsYou'd definitely need to add something before hashing. Hashing the filename alone won't prevent name collisions, because the same filename will always produce the same hash.
-
frooyo over 13 years@jduren, please update your anser to be: hash('md5', $_FILES["filename"]["tmp_name"] ). If you update the answer, I'll mark is a "accepted"
-
jduren over 13 years@Wrikken tempnam() actually creates a new file altogether doesn't it? Using a hashed unique string then just renaming the file would be more ideal.
-
frooyo over 13 yearsNo, that's the tmp filename PHP uses to when a user upload a file to the server.
-
jduren over 13 years@user401839 No, Wrikken mentioned a tempnam() function in his comment (4th comment down on my answer) and I am wondering in terms of performance, is using tempnam() which creates a new file altogether better versus just renaming the file with a hash of the original filename with the time or some other unique data appended.
-
Wrikken over 13 years@jduren: performance difference is negligible, avoiding race conditions & filename-collisions isn't, and before you've got a solid implementation of those
tempnam()
should be already done. -
frooyo over 13 yearsThis makes no sense. You're taking the MD5 of the uniqid function. Why?
-
Crozin over 13 yearsThat's only example. You can use original filename etc. The point is that you should check in loop whether generated filename is already in use. If so regenerate filename.
-
jduren over 13 years@Wikkan any chance you can provide a useage example where you would use tempnam() to deal with this question for instance. I read a little off the PHP Man page for the tempnam() function and just don't see how tempnam is useful in this instance. As far as I can tell it creates a a temporary file with the option to set a prefix. But it creates it with the .tmp extension then returns the filename? Creating an entire new file just to get a unique string seems a bit overkill but I may be interpreting it incorrectly. Example?
-
Wrikken over 13 years(1) You will actually use the file as the new file, the target for the upload (that the file isn't temporary as
tempnam
will have you think doesn't matter, and don't let that throw you of) (2) No single upload every overwrites an existing one, even if some_hash(filename) is equal (3) No 2 simultaneous upload will ever claim the same filename, even though they are simultaneous and result in the same hash. And yes, the chance for (2) or (3) to occur are slim, however, given a combination of enough visitors & time, it will happen someday, and then you're glad you coded to make it no problem. -
Andras Nemeth about 11 yearsTaking md5() of uniqid() makes no sense. Taking md5() of the original filename (i.e. a constant value) in a loop makes even less sense. :-)
-
hijarian over 10 yearsOf course, taking md5 out of uniqid does makes sense! uniqid is very long string containing dashes, md5 converts it to the sequence of alphanumeric characters, which is a lot more pleasing to see. Even taking md5 out of
time() . $filename
makes a lot of sense - you're generating the alphanumeric filenames guaranteed to be unique! -
hijarian over 10 years@Crozin Sorry for necroposting, but why not md5(time()), but md5(uniqid())? I understand it's maybe a matter of taste but anyway?
-
Crozin over 10 years
time()
has 1 second resolution, whileuniqid()
(which usesmicrotime()
) has 1 milisecond resolutioin. This means that if you'd usetime()
in case of collision your loop would do the same task for 1 second - it would be pointless. -
wyz1 about 5 yearsJust reading through this thread i'v become smarter. I happen to not pay attention to such finer details as noticing the time performance difference between uniqid() and time(). Next time instead of just going with the highest rated answer I'll read though all answers and following comments.
-
wyz1 about 5 yearsWhat happens when the user needs to upload multiple files?
-
wyz1 about 5 yearsHow much tests has this function been through? Can i simply copy and paste this in my code?
-
Thibault Witzig about 5 years@wyz1 You may have misunderstood Crozin's comment (or I have misunderstood yours :p). The difference between time() and uniqid() is not a matter of performance but a matter of precision. With time(), you have duplicates if two files are uploaded in the same second. With uniqid(), you have duplicates only if two files are uploaded in the same 1/1000000th of a second. Besides, uniqid() has an optional parameter to add more entropy to the result and make it even safer.
-
xorinzor almost 2 years@wyz1 you should never just copy-paste code if you don't understand what it does