How can I sanitize a string for use as a filename?
Solution 1
You can use PathGetCharType function, PathCleanupSpec function or the following trick:
function IsValidFilePath(const FileName: String): Boolean;
var
S: String;
I: Integer;
begin
Result := False;
S := FileName;
repeat
I := LastDelimiter('\/', S);
MoveFile(nil, PChar(S));
if (GetLastError = ERROR_ALREADY_EXISTS) or
(
(GetFileAttributes(PChar(Copy(S, I + 1, MaxInt))) = INVALID_FILE_ATTRIBUTES)
and
(GetLastError=ERROR_INVALID_NAME)
) then
Exit;
if I>0 then
S := Copy(S,1,I-1);
until I = 0;
Result := True;
end;
This code divides string into parts and uses MoveFile to verify each part. MoveFile will fail for invalid characters or reserved file names (like 'COM') and return success or ERROR_ALREADY_EXISTS for valid file name.
PathCleanupSpec is in the Jedi Windows API under Win32API/JwaShlObj.pas
Solution 2
Regarding the question whether there is any API function to sanitize a file a name (or even check for its validity) - there seems to be none. Quoting from the comment on the PathSearchAndQualify() function:
There does not appear to be any Windows API that will validate a path entered by the user; this is left as an an ad hoc exercise for each application.
So you can only consult the rules for file name validity from File Names, Paths, and Namespaces (Windows):
-
Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
- The following reserved characters are not allowed:
< > : " / \ | ? * - Characters whose integer representations are in the range from zero through 31 are not allowed.
- Any other character that the target file system does not allow.
- The following reserved characters are not allowed:
Do not use the following reserved device names for the name of a file:
CON
,PRN
,AUX
,NUL
,COM1..COM9
,LPT1..LPT9
.
Also avoid these names followed immediately by an extension; for example,NUL.txt
is not recommended.
If you know that your program will only ever write to NTFS file systems you can probably be sure that there are no other characters that the file system does not allow, so you would only have to check that the file name is not too long (use the MAX_PATH
constant) after all invalid chars have been removed (or replaced by underscores, for example).
A program should also make sure that the file name sanitizing has not lead to file name conflicts and it silently overwrites other files which ended up with the same name.
Solution 3
{
CleanFileName
---------------------------------------------------------------------------
Given an input string strip any chars that would result
in an invalid file name. This should just be passed the
filename not the entire path because the slashes will be
stripped. The function ensures that the resulting string
does not hae multiple spaces together and does not start
or end with a space. If the entire string is removed the
result would not be a valid file name so an error is raised.
}
function CleanFileName(const InputString: string): string;
var
i: integer;
ResultWithSpaces: string;
begin
ResultWithSpaces := InputString;
for i := 1 to Length(ResultWithSpaces) do
begin
// These chars are invalid in file names.
case ResultWithSpaces[i] of
'/', '\', ':', '*', '?', '"', '<', '>', '|', ' ', #$D, #$A, #9:
// Use a * to indicate a duplicate space so we can remove
// them at the end.
{$WARNINGS OFF} // W1047 Unsafe code 'String index to var param'
if (i > 1) and
((ResultWithSpaces[i - 1] = ' ') or (ResultWithSpaces[i - 1] = '*')) then
ResultWithSpaces[i] := '*'
else
ResultWithSpaces[i] := ' ';
{$WARNINGS ON}
end;
end;
// A * indicates duplicate spaces. Remove them.
result := ReplaceStr(ResultWithSpaces, '*', '');
// Also trim any leading or trailing spaces
result := Trim(Result);
if result = '' then
begin
raise(Exception.Create('Resulting FileName was empty Input string was: '
+ InputString));
end;
end;
Solution 4
For anyone else reading this and wanting to use PathCleanupSpec, I wrote this test routine which seems to work... there is a definate lack of examples on the 'net. You need to include ShlObj.pas (not sure when PathCleanupSpec was added but I tested this in Delphi 2010) You will also need to check for XP sp2 or higher
procedure TMainForm.btnTestClick(Sender: TObject);
var
Path: array [0..MAX_PATH - 1] of WideChar;
Filename: array[0..MAX_PATH - 1] of WideChar;
ReturnValue: integer;
DebugString: string;
begin
StringToWideChar('a*dodgy%\filename.$&^abc',FileName, MAX_PATH);
StringToWideChar('C:\',Path, MAX_PATH);
ReturnValue:= PathCleanupSpec(Path,Filename);
DebugString:= ('Cleaned up filename:'+Filename+#13+#10);
if (ReturnValue and $80000000)=$80000000 then
DebugString:= DebugString+'Fatal result. The cleaned path is not a valid file name'+#13+#10;
if (ReturnValue and $00000001)=$00000001 then
DebugString:= DebugString+'Replaced one or more invalid characters'+#13+#10;
if (ReturnValue and $00000002)=$00000002 then
DebugString:= DebugString+'Removed one or more invalid characters'+#13+#10;
if (ReturnValue and $00000004)=$00000004 then
DebugString:= DebugString+'The returned path is truncated'+#13+#10;
if (ReturnValue and $00000008)=$00000008 then
DebugString:= DebugString+'The input path specified at pszDir is too long to allow the formation of a valid file name from pszSpec'+#13;
ShowMessage(DebugString);
end;
Solution 5
// for all platforms (Windows\Unix), uses IOUtils.
function ReplaceInvalidFileNameChars(const aFileName: string; const aReplaceWith: Char = '_'): string;
var
i: integer;
begin
Result := aFileName;
for i := Low(Result) to High(Result) do
begin
if not TPath.IsValidFileNameChar(Result[i]) then
Result[i] := aReplaceWith;
end;
end.
Mason Wheeler
A lifelong programmer who's been coding in Delphi since its initial release and currently makes a living at it.
Updated on August 08, 2021Comments
-
Mason Wheeler over 2 years
I've got a routine that converts a file into a different format and saves it. The original datafiles were numbered, but my routine gives the output a filename based on an internal name found in the original.
I tried to batch-run it on a whole directory, and it worked fine until I hit one file whose internal name had a slash in it. Oops! And if it does that here, it could easily do it on other files. Is there an RTL (or WinAPI) routine somewhere that will sanitize a string and remove invalid symbols so it's safe to use as a filename?
-
Mason Wheeler almost 15 yearsNope. When you consider that recent versions of Windows support full Unicode filenames, and something like Ä£̆Ώۑ≥♣.txt is valid, you definitely want a blacklist for an operation like this, not a whitelist.
-
cjs almost 15 yearsNot in the way I interpreted the question. You're not looking to see if an arbitrary string is a valid filename, you're looking to guarantee a valid filename from a transformation of an arbitrary string. These are (perhaps subtly) different. For example, if you could translate any string into a unique 8-digit number, that might bear no obvious relation to the original string, but still guarantees you can save the darn thing to disk.
-
cjs almost 15 yearsDo you also want the filename to look as much like the original string as possible?
-
Mason Wheeler almost 15 yearsYes, that's exactly what I want.
-
Mason Wheeler almost 15 yearsThanks! PathCleanupSpec looks like exactly what I'm looking for.
-
sleske almost 14 years+1 Sanitizing must always use whitelists. Otherwise you're vulnerable as soon as new inputs become possible, or inputs become dangerous which were ok before (code changes in the interpreting code).
-
Remy Lebeau over 5 yearsUsing
nil
as the first parameter ofMoveFile()
is undocumented behavior. Also, do not checkGetLastError()
unlessMoveFile()
returns FALSE first, which this code is not checking for. -
Reversed Engineer about 5 yearsHas anyone got PathCleanupSpec to work from Delphi? I always get an empty string returned. Started with this narkive.com/l2Um5mzw:3.379.300 and tried for several hours now to get it to work
-
Reversed Engineer about 5 yearsThank you!! I wasn't able to get PathCleanupSpec to work until your answer