Why does the `ls` command sort files like this?
It depends on the collation order of your locale:
>LANG=en_IE.UTF-8 ls -1 foo*
foopa.png
foo.png
fooqa.png
>LANG=C ls -1 foo*
foo.png
foopa.png
fooqa.png
You can also use the LC_COLLATE variable instead of LANG, and use the POSIX locale instead of the C one.
C collation order is purely alphabetical (ASCII order). Other collation orders (such as English) may consider spaces and special characters such as dots as separators and either handle "words" separately or just ignore these separators (which appears to be the case here).
Note that the non-UTF-8 locale sorts using alphabetic ASCII, too:
>LANG=en_IE ls -1 foo*
foo.png
foopa.png
fooqa.png
After some more digging, it appears that ignoring punctuation is a common feature of Unicode-aware locales such as the *.UTF-8
ones.
Related videos on Youtube
mooncat39
Updated on September 18, 2022Comments
-
mooncat39 over 1 year
As I was trying to reverse engineer the ls command, I came upon an interesting behavior. When I make 3 files,
foo.png
,foopa.png
, andfooqa.png
, ls sorts them asfoopa.png
,foo.png
, andfooqa.png
. I also tried it using the .gif extension and it seems to be that it happens when p and q are replaced by the first letter of the extension and the next letter in the alphabet; so in the case of .gif it would be g and h. (fooga.gif
, thenfoo.gif
, thenfooha.gif
)Why does it order the output this way?
-
Mokubai over 4 yearsThat's quite interesting, is there any explanation for this ordering or how to configure it beyond (or with more granularity than) the LANG variable?
-
xenoid over 4 yearsActually you can use LC_COLLATE instead of LANG. See also this
-
chrylis -cautiouslyoptimistic- over 4 yearsYikes, that was a terrible collation decision.
-
xenoid over 4 years@chrylis Not necessarily. UTF-8 file names are meant to be used in local languages, and abide to their sorting rules. For instance in French, "de Gaulle" and "Degaulle" are sorted next to each other (space doesn't count, and in other names the apostrophe or the dash don't either), and we would expect file named after them to be sorted the same way. The problem here is that the dot has its own meaning in file names and that the expected sort is closer to alphabetic (but IMHO alphabetic isn't perfect either for file names). The extension sort (
-X) in
ls` is a step in the right direction. -
Roman Odaisky over 4 years
ls -v
is much more of a step in the right direction -
ubfan1 over 4 yearsDon't forget the rather broken implementation of character ranges (en_US.UTF-8,in bash) . touch a A b B c C then ls [a-c]. Not one person in a thousand will guess that output, even though it's perfectly understandable how it was done!