Sorting numerically in a comma-delimited file with Unix

16,218

Solution 1

This is certainly a dirty workaround, but I figured out a way to do this thanks to @slhck's tip about locales. If a better answer comes along that would be more helpful to others, I'll certainly accept it since this pretty much only works for my specific problem.

I set the locale to Spanish (Bolivian) so that the commas were treated like decimal points, then standard numeric sorting did the trick.

$ export LC_NUMERIC="es_BO.utf8"

$ cat test.csv
100,00869184
6492,8361
1234,31
200,04071

$ sort -n test.csv
100,00869184
200,04071
1234,31
6492,8361

Solution 2

GNU's sort does this by default:

$ cat test
100,00869184
6492,8361
1234,31
200,04071

$ gsort -nt',' < test
100,00869184
200,04071
1234,31
6492,8361

Version:

$ gsort --version
sort (GNU coreutils) 8.19

There's a caveat though: If your sorting does not work as expected, then your locale is probably set to something different than C. Why is this? locale defines sorting and interpretation of letters, numbers, decimal characters et cetera.

To check this, just enter locale in a Terminal. Is LC_NUMERIC set to en_US.UTF-8, maybe? This would explain the wrong sort order. Set it back to C:

export LC_NUMERIC=C

Then, try your sort command again. If you want to set your global locale to C, do this with:

export LC_ALL=C

Solution 3

Try adding the -g option which is suppose to perform numeric sorting.

Try:

sort -t',' -g <whatever>
Share:
16,218

Related videos on Youtube

dpatchery
Author by

dpatchery

Updated on September 18, 2022

Comments

  • dpatchery
    dpatchery over 1 year

    I've got a comma-separated file that looks like this:

    100,00869184
    6492,8361
    1234,31
    200,04071
    

    I want to use sort to sort this file numerically by the first column only.

    Desired Result:

    100,00869184
    200,04071
    1234,31
    6492,8361
    

    How do I achieve this using sort? It seems like the commas are being treated like thousands separators instead of delimiters even when I call them out as such.

    Both sort -t',' -n and sort -t',' -nk1' give me this:

    1234,31
    200,04071
    6492,8361
    100,00869184
    

    Sorting by the default (no parameters) or using sort -t',' gives me this:

    100,00869184
    1234,31
    200,04071
    6492,8361
    

    And sorting as a number sort -n gives me this:

    1234,31
    200,04071
    6492,8361
    100,00869184
    

    How can I use sort to achieve my desired result?

    Edited to add: This is for a one-time operation to create a sorted list of about 7 million lines, so workarounds or other unorthodox methods are perfectly acceptable.

  • dpatchery
    dpatchery almost 12 years
    Isn't -n numeric sorting? -g gives me an illegal option.
  • slhck
    slhck almost 12 years
    -g is the general-numeric-sort option and should actually be available in any recent version of sort. @dpatchery
  • dpatchery
    dpatchery almost 12 years
    This is at my place of work so I almost definitely do not have a recent version :)
  • dpatchery
    dpatchery almost 12 years
    I don't have access to GNU in my environment. Is it something I could easily get then remove when I am finished? HMU in chat if someone would like to help me do this... I'm quite the UNIX newbie.
  • slhck
    slhck almost 12 years
    I'm pretty sure it's just a locale issue. But what's sort --version for you, actually?
  • dpatchery
    dpatchery almost 12 years
    sort --version gives me an illegal argument. -- commands haven't worked for me in the past either. I checked the man page and there's no version called out explicitly, but it does list "HP-UX 11i Version 2: August 2003" if that helps at all. My LC_NUMERIC is set to "C".
  • slhck
    slhck almost 12 years
    German locale for example would use , as a decimal separator. I've never used HP-UX though.
  • slhck
    slhck almost 12 years
    Ah see, I would have suggested using a German locale or similar. Can't think of anything else right now without being able to test it or what tools you have available, since this is a pretty rare Unix version.
  • dpatchery
    dpatchery almost 12 years
    @slhck That seems to be the crux of most of the problems I get stuck on in UNIX :) Thanks for your help in getting me to a solution, regardless.