What characters are forbidden in Windows and Linux directory names?
Solution 1
A “comprehensive guide” of forbidden filename characters is not going to work on Windows because it reserves filenames as well as characters. Yes, characters like
*
"
?
and others are forbidden, but there are a infinite number of names composed only of valid characters that are forbidden. For example, spaces and dots are valid filename characters, but names composed only of those characters are forbidden.
Windows does not distinguish between upper-case and lower-case characters, so you cannot create a folder named A
if one named a
already exists. Worse, seemingly-allowed names like PRN
and CON
, and many others, are reserved and not allowed. Windows also has several length restrictions; a filename valid in one folder may become invalid if moved to another folder. The rules for
naming files and folders
are on the Microsoft docs.
You cannot, in general, use user-generated text to create Windows directory names. If you want to allow users to name anything they want, you have to create safe names like A
, AB
, A2
et al., store user-generated names and their path equivalents in an application data file, and perform path mapping in your application.
If you absolutely must allow user-generated folder names, the only way to tell if they are invalid is to catch exceptions and assume the name is invalid. Even that is fraught with peril, as the exceptions thrown for denied access, offline drives, and out of drive space overlap with those that can be thrown for invalid names. You are opening up one huge can of hurt.
Solution 2
-
The forbidden printable ASCII characters are:
-
Linux/Unix:
/ (forward slash)
-
Windows:
< (less than) > (greater than) : (colon - sometimes works, but is actually NTFS Alternate Data Streams) " (double quote) / (forward slash) \ (backslash) | (vertical bar or pipe) ? (question mark) * (asterisk)
-
-
Non-printable characters
If your data comes from a source that would permit non-printable characters then there is more to check for.
-
Linux/Unix:
0 (NULL byte)
-
Windows:
0-31 (ASCII control characters)
Note: While it is legal under Linux/Unix file systems to create files with control characters in the filename, it might be a nightmare for the users to deal with such files.
-
-
Reserved file names
The following filenames are reserved:
-
Windows:
CON, PRN, AUX, NUL COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9 LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
(both on their own and with arbitrary file extensions, e.g.
LPT1.txt
).
-
-
Other rules
-
Windows:
Filenames cannot end in a space or dot.
-
macOS:
You didn't ask for it, but just in case: Colon
:
and forward slash/
depending on context are not permitted (e.g. Finder supports slashes, terminal supports colons). (More details)
-
Solution 3
Under Linux and other Unix-related systems, there are only two characters that cannot appear in the name of a file or directory, and those are NUL '\0'
and slash '/'
. The slash, of course, can appear in a pathname, separating directory components.
Rumour1 has it that Steven Bourne (of 'shell' fame) had a directory containing 254 files, one for every single letter (character code) that can appear in a file name (excluding /
, '\0'
; the name .
was the current directory, of course). It was used to test the Bourne shell and routinely wrought havoc on unwary programs such as backup programs.
Other people have covered the rules for Windows filenames, with links to Microsoft and Wikipedia on the topic.
Note that MacOS X has a case-insensitive file system. Current versions of it appear to allow colon :
in file names, though historically that was not always the case:
$ echo a:b > a:b
$ ls -l a:b
-rw-r--r-- 1 jonathanleffler staff 4 Nov 12 07:38 a:b
$
POSIX defines a Portable Filename Character Set consisting of:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 . _ -
Sticking with names formed solely from those characters avoids most of the problems, though Windows still adds some complications.
1 It was Kernighan & Pike in ['The Practice of Programming'](http://www.cs.princeton.edu/~bwk/tpop.webpage/) who said as much in Chapter 6, Testing, §6.5 Stress Tests:
When Steve Bourne was writing his Unix shell (which came to be known as the Bourne shell), he made a directory of 254 files with one-character names, one for each byte value except
'\0'
and slash, the two characters that cannot appear in Unix file names. He used that directory for all manner of tests of pattern-matching and tokenization. (The test directory was of course created by a program.) For years afterwards, that directory was the bane of file-tree-walking programs; it tested them to destruction.
Note that the directory must have contained entries .
and ..
, so it was arguably 253 files (and 2 directories), or 255 name entries, rather than 254 files. This doesn't affect the effectiveness of the anecdote, or the careful testing it describes.
TPOP was previously at http://plan9.bell-labs.com/cm/cs/tpop and http://cm.bell-labs.com/cm/cs/tpop but both are now (2021-11-12) broken. See also Wikipedia on TPOP.
Solution 4
Instead of creating a blacklist of characters, you could use a whitelist. All things considered, the range of characters that make sense in a file or directory name context is quite short, and unless you have some very specific naming requirements your users will not hold it against your application if they cannot use the whole ASCII table.
It does not solve the problem of reserved names in the target file system, but with a whitelist it is easier to mitigate the risks at the source.
In that spirit, this is a range of characters that can be considered safe:
- Letters (a-z A-Z) - Unicode characters as well, if needed
- Digits (0-9)
- Underscore (_)
- Hyphen (-)
- Space
- Dot (.)
And any additional safe characters you wish to allow. Beyond this, you just have to enforce some additional rules regarding spaces and dots. This is usually sufficient:
- Name must contain at least one letter or number (to avoid only dots/spaces)
- Name must start with a letter or number (to avoid leading dots/spaces)
- Name may not end with a dot or space (simply trim those if present, like Explorer does)
This already allows quite complex and nonsensical names. For example, these names would be possible with these rules, and be valid file names in Windows/Linux:
A...........ext
B -.- .ext
In essence, even with so few whitelisted characters you should still decide what actually makes sense, and validate/adjust the name accordingly. In one of my applications, I used the same rules as above but stripped any duplicate dots and spaces.
Solution 5
The easy way to get Windows to tell you the answer is to attempt to rename a file via Explorer and type in a backslash, /, for the new name. Windows will popup a message box telling you the list of illegal characters.
A filename cannot contain any of the following characters:
\ / : * ? " < > |
Microsoft Docs - Naming Files, Paths, and Namespaces - Naming Conventions
Jeff
Updated on January 11, 2022Comments
-
Jeff over 2 years
I know that
/
is illegal in Linux, and the following are illegal in Windows (I think)*
.
"
/
\
[
]
:
;
|
,
What else am I missing?
I need a comprehensive guide, however, and one that takes into account double-byte characters. Linking to outside resources is fine with me.
I need to first create a directory on the filesystem using a name that may contain forbidden characters, so I plan to replace those characters with underscores. I then need to write this directory and its contents to a zip file (using Java), so any additional advice concerning the names of zip directories would be appreciated.
-
Adrian McCarthy over 14 yearsThe key phrase from the MSDN link is "[and a]ny other character that the target file system does not allow". There may be different filesystems on Windows. Some might allow Unicode, others might not. In general, the only safe way to validate a name is to try it on the target device.
-
j_kubik over 11 years254 files? And what about utf8?
-
Jonathan Leffler over 11 yearsThe 254 files were all single-character file names, one per character that was permitted in a filename. UTF-8 wasn't even a gleam in the eye back when Steve Bourne wrote the Bourne shell. UTF-8 imposes rules about the valid sequences of bytes (and disallows bytes 0xC0, 0xC1, 0xF5-0xFF altogether). Otherwise, it isn't much different — at the level of detail I'm discussing.
-
Dan Pritts over 10 yearsThe on-disk directory separator for MacOS HFS+ filesystems is actually a ':' rather than a '/'. The OS usually (probably always) does the right thing when you are working with *nix APIs. But don't expect this to happen reliably if you are moving to the OSX world, e.g. with applescript. It looks like maybe Cocoa APIs use the / and hide the : from you too, but I am pretty sure the old Carbon APIs don't.
-
Christopher Oezbek over 8 yearsOthers have said that already and it is not constructive. When I came here looking for an answer I wanted the list I had to gather elsewhere: Which chars to filter out from user-input when creating a good attempt at a valid filename. The question if characters together become invalid, also could need some elaboration.
-
Borodin over 8 yearsThere are some guidelines, and “there are a infinite number of names composed only of valid characters that are forbidden” isn't constructive. Likewise “Windows does not distinguish between upper-case and lower-case characters” is a foolish exception — the OP is asking about syntax and not semantics, and no right-minded people would say that a file name like
A.txt
was invalid becausea.TXT
may exist. -
Borodin over 8 yearsThe idea that you shouldn't permit user access to file structure addresses is sound but very poorly phrased. Users should be able to examine and manipulate the entities that the application exposes to them. While those entities may be dynamically-named abstracts of multiple databases, there is nothing wrong with asking the user for the name of a file. The securities on an application should prevent users from making mistakes and from exceeding their authority; they should not prevent them from doing what they need to do
-
Borodin over 8 yearsI regularly use Perl, and my habit is to use strings quoted as
q< ... >
because neither<
nor>
are valid within a Windows file path. I suspect that the restrictions are archaic and intended to avoid characters that are significant in a DOS environment, or at least within a Windows command shell -
AntonPiatek about 8 years
COPY CON PRN
means read from keyboard input, or possible stdin, and copy it to the printer device. Not sure it is still valid on modern windows, but certainly was for a long time. In the old days you could use it to type text and have a dot-matrix printer simply output it. -
pkh almost 8 yearsAnd what about my non-english-speaking users, who would all be screwed by this?
-
AeonOfTime almost 8 years@pkh: As I mentioned in my post, you would include any needed unicode characters in your whitelist. Ranges of characters can usually be specified quite easily, especially if you use regular expressions for example.
-
tahoar over 7 yearsWe use a whitelist approach, but don't forget on Windows you have to manage reserved, case-independent strings, like device names (prn, lpt1, con) and . and ..
-
Aleksandr Dubinsky over 7 yearsAlso newlines and other control characters
-
PypeBros over 7 yearswould you mind commenting on having
@
in the list ? -
Alcaro over 7 yearsNewlines are not banned on Linux. I'd argue they should be, though... and if NUL is banned on Linux, then it's banned on Windows, it fills the same purpose.
-
Nigel Alderton over 7 yearsThe question was which characters are illegal. Most of the characters in your list are legal.
-
Jim Michaels over 7 yearsin DOS, - (hyphen) is not allowed. command.com I think converts it to _ or ignores it depending on kind of DOS.
-
Jim Michaels over 7 yearsin DOS and windows, drive letters are suffixed with a : ntfs does not allow it as part of specifically a filename, but part of a filepath. example: C:\ABCD\GHI.TXT on linux that path once mounted would look like /mnt/c/ABCD/GHI.TXT you can double-quote a filepath in linux that has spaces in order to create, remove, etc. same with windows. DOS is dependent on lfndos driver
-
Admin about 7 yearsLinux does not disallow Windows illegal characters?
-
firegurafiku about 7 years@Soaku: of course, not, since the world isn't revolving around Microsoft. Why add unnecessary restrictions when there're only two characters which are absolutely necessary to forbid?
-
LogicDaemon almost 7 years@firegurafiku actually there are no characters "absolutely necessary" to forbid. It's always compromise. Even '\0'.
-
firegurafiku almost 7 years@LogicDaemon, I can agree that
'\0'
isn't "absolutely necessary" to forbid (it would be allowed if C had better string handling), but at least slash character should be forbidden unless we're going to represent file paths as XML nodes or something. -
LogicDaemon almost 7 years@firegurafiku "/" is just convention – dirnames are stored separately from each other anyway, so '/' can appear in names with no problem (if permitted). If used in a dir/filename within a path, it has to be screened, but that's case with many other characters too. Dealing with '\0' will involve separate storage of string length everywhere, that's actually harder.
-
ThorSummoner almost 7 yearsfor literals in BASH, the best way I've found to declare literals without interpolation is
$'myvalueis'
, ex:$ echo 'hi' > $'2>&1'
,cat 2\>\&1
"hi" -
Jim Balter over 6 yearsThe question isn't about shells.
-
Mike over 6 yearsYour answer @FCastro is correct from the technical point of view. However from the UX perspective it's a nightmare - the user is forced to play the "type something and I'll tell you if you succeed" game again and again. I'd rather see a message (warning style) telling the user that they have entered an illegal character which will later be converted.
-
Casey over 6 years"You cannot, in general, use user-generated text to create Windows directory names." <-- If you want to do this you can just have a character whitelist and it'll largely work, if you can ignore the already-exists issue.
-
ashleedawg about 6 yearsthe letter
b
? lol, I assume that's the b fromlank spaces
... well that still leaves a few... I renamed a picture(),-.;[]^_~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ.jpg
but had to change it back because it looked angry... -
Eryk Sun almost 6 years"CONIN$" and "CONOUT$" are also reserved. Unlike "CON" they allow accessing the console input and screen buffer with read-write access. Prior to Windows 8, only the base filenames are reserved. Starting with Windows 8 the underlying console IPC was redesigned to use a device driver, so these two names are now handled generically as DOS devices, the same as "NUL", etc. This means they can be used in local-device paths such as "\\.\CONIN$" and "\\?\CONOUT$" and also that the API pretends the names 'exist' in every existing directory. For example, "C:\Temp\CONOUT$" references console output.
-
Eryk Sun almost 6 yearsNote that the reserved DOS device names and the rule about filenames ending in a dot or spaces are applied by the runtime library when converting a DOS path to a native NT path. If a path starts with the "\\?\" local-device prefix, this normalization step gets skipped, except to replace "\\?\" with NT's "\??\" device prefix. This prefix instructs the object manager to search the logon-session and global DOS device directories for a symbolic link to a native NT device, which is usually a device object in the "\Device" directory.
-
Eryk Sun almost 6 yearsOTOH, the reserved characters are not simply a function of the DOS namespace. They're reserved at a low level in the kernel and file system. The "\" character is NT's path separator and is reserved by the object manager. Everything else is allowed in object names, which includes DOS device names such as "C:". The other reserved characters, including ASCII control characters, are due to the kernel's file-system runtime library, which is used by Microsoft's file systems. These characters are reserved in primary filenames, not in stream names.
-
Eryk Sun almost 6 yearsThe
*?<>"
characters are reserved as wildcard characters. This is due to a peculiar design decision to have file systems implement filtering a directory listing at a low level in their implementation of theNtQueryDirectoryFile
system call. In POSIX systems this is implemented at the application level. -
Mad Physicist almost 6 yearsYou can name a file with a forward slash on most Linux distros just fine. There may be problems retrieving it though. It's not forbidden, it's just stupid. You can create a file outside the shell (which would automatically parse the
/
as a path separator), e.g. with a C program or Python script. -
Jim Balter over 5 years"You can name a file with a forward slash on most Linux distros just fine." -- No, you can't. '/' is always treated as a directory separator, by the kernel, not just the shell. There's no way to get around this with a C program or Python script or any other way.
-
Jim Balter over 5 yearsChristopher Oezbek provided such a black list in 2015.
-
laurent over 5 yearsThat observation "You cannot, in general, use user-generated text to create Windows directory names" is a bit ludicrous to be honest. There are plenty of cases where you want to allow users to name their files and folders, so just saying "don't do it" is not helpful.
-
Martin Bonner supports Monica over 5 yearsYou've missed the Windows restricition: must not end in dot or space.
-
AeonOfTime over 5 yearsThanks @MartinBonner, I added that info. I tried it in Windows Explorer and the command line, it simply trims the trailing spaces or dot - still, there's no guarantee the programming language one uses will always safely do that for you - not to mention creating files that suddenly do not match the name you used in your application.
-
Lutz Prechelt about 5 yearsFun fact: Using Cygwin you can readily create
lpt1
andlpt1.txt
. Then try deleting them in Windows Explorer: You can't. Or incmd.exe
: You can't. Cygwin can, though.It appears to be a 1980s restriction that is help up artificially. -
LarsH almost 5 years@mikerodent
\p{L}
is a good start and is available in some regexp engines. But it wouldn't allowà
if it occurs in decomposition form: the accent isn't a letter. See regular-expressions.info/unicode.html -
LarsH almost 5 years"you would include any needed unicode characters in your whitelist. Ranges of characters can usually be specified quite easily" - To do this for arbitrary (not known ahead of time) languages would be non-trivial. In some regexp engines you can use categories, like
\p{L}\p{M}*
(regular-expressions.info/unicode.html) to whitelist any letters together with their diacritics. But it wouldn't include the equivalent of digits, period, hyphen, underscore, etc. in non-Roman scripts. -
LarsH almost 5 years"All things considered, the range of characters that make sense in a file or directory name context is quite short." Maybe for some use cases. I'm working on a project now involving media files in 20 languages, and the filenames need to reflect the title of the media item because end users will be finding the content that way. Many of the names use punctuation. Any restriction on filename characters carries a price, so in this case we have to minimize restrictions. In this use case, the range of characters that don't make sense in a filename is far shorter and simpler than those that do.
-
AeonOfTime almost 5 years@LarsH, if you are working with 20 languages, I would not expect you to be able to use one catch-all regex. Personally, I would probaby try to create a base file name generator, with the possibility to extend this with specific rules for those languages that need additional or different rules. This way you have a catch-all, and can handle language specifics as well.
-
LarsH almost 5 yearsA reality for many programs these days is that you don't know who the customers will be, or what languages they will use. For example if you're publishing to the general public in an app store or Windows or Apple store. You could make your software English-only (or European-only) by default, which is a common approach ... and a frustrating one for speakers of other languages searching for software for their needs. It can also be an avoidable loss of revenue for the developer. It doesn't take that much more effort to design programs to be largely script-agnostic.
-
Andreas detests censorship almost 5 years@DanPritts I created a custom font/colour scheme in Xcode's preferences, naming it with a
/
in the name. That caused some issues, as it created a new directory with the scheme in. -
JBentley about 4 years@JimBalter Unless I've misunderstood, it's not constructive because "infinite number of names composed only of valid characters that are forbidden" is rather meaningless if the rules for filenames are well-defined and themselves not infinite. Nothing in this answer justified describing the possibilities as infinite in a way that is helpful or useful to the reader. E.g. contrast the following: (1) In Linux, "/" is not allowed. (2) No comprehensive guide for Linux is possible because there are an infinite number of disallowed names e.g. "/", "//", "///", "a/a", "b/b", etc.
-
Jonathan Leffler about 4 yearsNote that if a directory has a colon in its name, you cannot add the directory to a Unix
PATH
variable because colon is used as the separator (semicolon on Windows). So, programs in such a directory must either be run with a pathname that specifies where it is (could be relative or absolute), or you must be in the directory and have dot (.
, the current directory) inPATH
, which is widely regarded as a unsafe. -
atimholt about 4 yearsI'd say that any good code will say what it means. In this case, a whitelist feels a lot like a sort of “cargo cult” solution that will break in the case of millions of “unknown unknowns”. You're not disallowing impossible values, you're disallowing values that you're too afraid to test.
-
Robin Davies almost 4 yearsFor those who don't speak PowershelI, $FileNameInvalidChars is 0x00 through 0x1F, and : " < > | * ? \ /
-
Robin Davies almost 4 yearsSupplementary fun fact: you can programmatically create files with "*" and "?" in the name on windows. So technically, not illegal; just a very very bad idea. (The solution to deleting a file named "lpt1", by the way, is "ren lpt? lptx". Deleting a file named *.* might be more challenging).
-
iiminov almost 4 yearsI have three questions: 1. Why did you initialise
StringBuilder
with initial capacity value? 2. Why did you add 12 to the length of thefilename
? 3. Was 12 chosen arbitrarily or was there some thought behind this number? -
Eryk Sun almost 4 years@RobinDavies, you cannot create filenames with "*" and "?" in their names on a proper Windows filesystem, unless you're hacking the underlying filesystem data structures. These are reserved wildcard characters by every filesystem except for the named-pipe filesystem. Any filesystem driver that doesn't reserve them -- as well as the other wildcards
<
,>
, and"
-- is fundamentally broken. It will not function properly withFindFirstFileW
, which depends on the filesystem to support DOS wildcard matching (requires all 5 wildcards) in theNtQueryDirectoryFile
system call. -
Cadoiz over 3 years@LarsH You can also try to allow as much as possible using unicode like suggested here: stackoverflow.com/a/61448658/4575793 Achtually, almost everything is allowed, so maybe a Whitelist is not the best approach.
-
peterh over 3 yearsIt is not a nightmare to deal with non-printable characters in filenames from shell scripts, although it is harder as we would like to.
-
DDR over 3 yearsI've made a program to apply these changes at github.com/DDR0/fuseblk-filename-fixer. Let me know if there's any characters (or patterns) I've missed!
-
Cadoiz over 3 yearsPossible duplicate to stackoverflow.com/a/32565700/4575793
-
Zsolti over 3 yearsI remember that it used to be like that. I just tried it in Windows 10 and that message box is not showing up anymore, but a sound is being played instead.
-
AlainD about 3 yearsOn MacOS, the only forbidden printable ASCII character is
:
. Using the Windows superset of forbidden characters is sensible because it covers Linux and MacOS too. -
Dark Star1 about 3 yearsI just confirmed @AlainD's comment. The only character I wasn't allowed to name my file is the colon character. However the reason for investigating was because I received a file from a windows user with a colon in it's name.
-
Charlie Rix about 3 yearsSorry for the delay, I just noticed this question 1) Initializing stringbuilder with a length is a bit of a micro optimization. I don't remember exactly, but it starts with a small buffer and doubles each time the buffer size is exceeded. 2) Adding a bit extra guarantees that the length isn't off by one. 3) The world would be better off if we use dozenal instead of decimal. 12 is the dozenal equivalent of adding 10 (I just needed to pad the length by a small arbitrary amount).
-
Cadoiz about 3 yearsI took the freedom to improve your formatting for better readability. I also explained the same base idea above and now incorporated some of your suggestions, if that's okay. Thank you! stackoverflow.com/a/61448658/4575793
-
Jeyekomon almost 3 years"Name must start with a letter or number" So the name cannot start with an underscore? I'm confused.
-
AeonOfTime almost 3 years@Jeyekomon: As I mentioned, it should start with a letter or number "to avoid leading dots/spaces". An underscore is acceptable, as is a hyphen or other alphanumerical characters.
-
Jeyekomon almost 3 years@AeonOfTime Ah, in that case I would recommend rewording the line to simple "Name must not start with a dot or space".
-
Cees Timmerman over 2 yearsOk, so which characters are illegal?
-
Cees Timmerman over 2 yearsDuplicate of stackoverflow.com/a/44750843/819417
-
dialer over 2 yearsPlease note that the assumption that Windows file names are case-insensitive is incorrect. To make matters worse, case sensitivity rules on Windows can now be set per-directory.
-
Cadoiz over 2 yearsI took the freedom to add a screenshot. Unfortunately, your link was dead. I updated it to an archive link, but it only works mediocre.
-
Cadoiz over 2 years(
" < > |
are invalid for both paths and files) -
Cadoiz over 2 yearsThere is a difference between being disallowed as a path character (
" < > |
) and being forbidden as a file name char (: * ? \ /
+ path chars) -
Vopel over 2 yearsFilenames are technically able to end in a space in Windows, but file explorer is not able to properly interact with it. It can only be interacted with using UNC pathes. To see for yourself, you can do
echo; > "\\?\%CD%\test "
in command pompt. You'll notice you can't delete or open it in explorer. Usedel "\\?\%CD%\test "
to get rid of it orren "\\?\%CD%\test " "test"
to rename it. Not really useful information, but it's a handy thing to know about if you ever run into a file with a trailing space. -
i30817 over 2 yearsIt would be great if 'someone' at the unicode consortium reserved a range just for 'idiotic OSes that abuse illegal characters' whose font mapping would map to the 'illegal characters glyphs' but be different. Even replacements for the ? have different width and characteristics, leading me to want to replace ! too and be annoyed when even then the height is not consistent with '.' (for instance).