Why does the `ls` command sort files like this?

5,175

It depends on the collation order of your locale:

>LANG=en_IE.UTF-8 ls -1 foo*
foopa.png
foo.png
fooqa.png

>LANG=C ls -1 foo* 
foo.png
foopa.png
fooqa.png

You can also use the LC_COLLATE variable instead of LANG, and use the POSIX locale instead of the C one.

C collation order is purely alphabetical (ASCII order). Other collation orders (such as English) may consider spaces and special characters such as dots as separators and either handle "words" separately or just ignore these separators (which appears to be the case here).

Note that the non-UTF-8 locale sorts using alphabetic ASCII, too:

>LANG=en_IE ls -1 foo*
foo.png
foopa.png
fooqa.png

After some more digging, it appears that ignoring punctuation is a common feature of Unicode-aware locales such as the *.UTF-8 ones.

Share:
5,175

Related videos on Youtube

mooncat39
Author by

mooncat39

Updated on September 18, 2022

Comments

  • mooncat39
    mooncat39 over 1 year

    As I was trying to reverse engineer the ls command, I came upon an interesting behavior. When I make 3 files, foo.png, foopa.png, and fooqa.png, ls sorts them as foopa.png, foo.png, and fooqa.png. I also tried it using the .gif extension and it seems to be that it happens when p and q are replaced by the first letter of the extension and the next letter in the alphabet; so in the case of .gif it would be g and h. (fooga.gif, then foo.gif, then fooha.gif)

    Why does it order the output this way?

  • Mokubai
    Mokubai over 4 years
    That's quite interesting, is there any explanation for this ordering or how to configure it beyond (or with more granularity than) the LANG variable?
  • xenoid
    xenoid over 4 years
    Actually you can use LC_COLLATE instead of LANG. See also this
  • chrylis -cautiouslyoptimistic-
    chrylis -cautiouslyoptimistic- over 4 years
    Yikes, that was a terrible collation decision.
  • xenoid
    xenoid over 4 years
    @chrylis Not necessarily. UTF-8 file names are meant to be used in local languages, and abide to their sorting rules. For instance in French, "de Gaulle" and "Degaulle" are sorted next to each other (space doesn't count, and in other names the apostrophe or the dash don't either), and we would expect file named after them to be sorted the same way. The problem here is that the dot has its own meaning in file names and that the expected sort is closer to alphabetic (but IMHO alphabetic isn't perfect either for file names). The extension sort (-X) in ls` is a step in the right direction.
  • Roman Odaisky
    Roman Odaisky over 4 years
    ls -v is much more of a step in the right direction
  • ubfan1
    ubfan1 over 4 years
    Don't forget the rather broken implementation of character ranges (en_US.UTF-8,in bash) . touch a A b B c C then ls [a-c]. Not one person in a thousand will guess that output, even though it's perfectly understandable how it was done!