What is the "directory order" of files in a directory (used by `ls -U`)?
Solution 1
It depends on the filesystem. For some filesystems (ext3 among them), a directory is actually a file with a well-known format, and the 'd' bit set in its permissions or mode. In that case, the history of what length filenames have gotten created and deleted can matter. The kernel will fill in the first entry in the directory file that has enough room to hold the new file's name. See http://e2fsprogs.sourceforge.net/ext2intro.html for more detail, the section titled "Physical Description".
For some other filesystems, Reiserfs among them, a directory is actually just some entries in a B+ tree that's not visible in the filesystem, so a plain ls
of a directory in a Reiserfs filesystem is in lexical order.
Solution 2
Indeed, there is no specific order to expect. It's up to the OS and file system implementation to order the entries the way it likes. One goal of this option is to get the faster listing possible which can be a significant factor with very large directories.
Solution 3
It is the order that the entries are stored internally in the filesystem. This will vary from filesystem to filesystem. For instance, the entires may be stored in some kind of balanced tree, such as a Red-Black Tree. There may be further optimizations for directories with a small number of entries, or to deal efficiently with additions and removals.
Stefan
Updated on September 18, 2022Comments
-
Stefan over 1 year
According to the man page for ls,
ls -U
means:do not sort; list entries in directory order.
What does "directory order" mean, and how is it determined?
The following test (executed on an ext3 file system), shows that it is not the order in which the files were created:
root@sv1010vm0007:/tmp# mkdir test root@sv1010vm0007:/tmp# touch test/2 root@sv1010vm0007:/tmp# touch test/1 root@sv1010vm0007:/tmp# touch test/3 root@sv1010vm0007:/tmp# ls -U test 2 3 1
-
LawrenceC almost 13 yearsThere are scheduling/caching algorithms in the kernel and filesystem drivers that influence when exactly data is written to disk. This is done to increase performance. Because of this optimization you can't really tell exactly when writes will happen. Also, old inodes in filesystems might be reused so new files can appear in directory slots where old files were. So the order of creation isn't necessary "directory order."
-
Alen Milakovic almost 13 years@Bruce: So, what are the contents of this "directory" file exactly?
-
Admin almost 13 yearsTraditionally, something very much like struct dirent, which I find defined in /usr/include/bits/dirent.h on a RHEL box, and on a Slackware 11.0 box. Both of those machines refuse to open() a directory directly. I know that I used to do things like "cat . > dot.as.file" to convert a directory to a regular file, the last time I did it for sure was Solaris 8, I think. Basically, struct dirent contains an inode (a number), a record length, a name length, and a string, which as I recall, may or may not be ASCII-Nul terminated.
-
Alen Milakovic almost 13 years@Bruce: Ok. And this C struct manages to get all the information about the files and subdirectories it contains into those fields?
-
Admin almost 13 yearsA directory in filesystems like ext2, or BSD FFS or the original Unix filesystem, just contained a list of inode numbers and corresponding file names. Invoking "ls" without arguments just earned you a list of file names. If you did "ls -l", "ls" itself would look up every file name by doing a stat(2) system call on the file name, and from the struct stat, get permissions, size, "file type", etc. So, no, a directory file doesn't have all the information, just a list of names.
-
Gilles 'SO- stop being evil' almost 13 years@Bruce: ext3/ext4 (with the
dir_index
option) also uses a B-tree variant to store directories. Several unices, including (at least some versions of) Solaris and *BSD, haveopendir(3)
callopen(2)
, but on Linux, as far as I can remember,open(2)
has refused to open directories. -
Alen Milakovic almost 13 years@Bruce: thanks for the explanation. One often forgets what details goes into these low level implementations.
-
jlliagre almost 13 years@ultrasawblade: Not sure why you wrote that as a comment to my own reply instead of a reply by itself or whatever.
-
Villemoes over 11 years@Gilles: It is not a problem to open(2) a directory, so long as one only uses O_RDONLY. This can be useful to get an fd which can be passed to the *at function family (openat, fstatat etc.). But read(2) on that fd will fail with EISDIR.
-
Martin Dorey about 11 yearsext2.sourceforge.net/2005-ols/paper-html/node3.html explains that the dir_index feature hashes the file name and a file system-specific secret. dump2efs includes dir_index in its Filesystem features line if the feature is enabled.
-
G-Man Says 'Reinstate Monica' almost 9 years@ultrasawblade: Strictly speaking, "old inodes [being] reused" has nothing to do with reuse of old directory slots. This can happen whenever old directory entries are unlinked; if they are all hard links, this needn't result in inodes being deallocated.
-
flygge over 3 yearsThe best way to think about "directory order" is really "the most efficient order in which to list the files". This order will differ depending a wide variety of factors and should not be relied upon without specific guarantees from the filesystem you are using.