Sorting numerically in a comma-delimited file with Unix
Solution 1
This is certainly a dirty workaround, but I figured out a way to do this thanks to @slhck's tip about locales. If a better answer comes along that would be more helpful to others, I'll certainly accept it since this pretty much only works for my specific problem.
I set the locale to Spanish (Bolivian) so that the commas were treated like decimal points, then standard numeric sorting did the trick.
$ export LC_NUMERIC="es_BO.utf8"
$ cat test.csv
100,00869184
6492,8361
1234,31
200,04071
$ sort -n test.csv
100,00869184
200,04071
1234,31
6492,8361
Solution 2
GNU's sort
does this by default:
$ cat test
100,00869184
6492,8361
1234,31
200,04071
$ gsort -nt',' < test
100,00869184
200,04071
1234,31
6492,8361
Version:
$ gsort --version
sort (GNU coreutils) 8.19
There's a caveat though: If your sorting does not work as expected, then your locale
is probably set to something different than C
. Why is this? locale
defines sorting and interpretation of letters, numbers, decimal characters et cetera.
To check this, just enter locale
in a Terminal. Is LC_NUMERIC
set to en_US.UTF-8
, maybe? This would explain the wrong sort order. Set it back to C
:
export LC_NUMERIC=C
Then, try your sort
command again. If you want to set your global locale
to C
, do this with:
export LC_ALL=C
Solution 3
Try adding the -g
option which is suppose to perform numeric sorting.
Try:
sort -t',' -g <whatever>
Related videos on Youtube
dpatchery
Updated on September 18, 2022Comments
-
dpatchery over 1 year
I've got a comma-separated file that looks like this:
100,00869184 6492,8361 1234,31 200,04071
I want to use
sort
to sort this file numerically by the first column only.Desired Result:
100,00869184 200,04071 1234,31 6492,8361
How do I achieve this using sort? It seems like the commas are being treated like thousands separators instead of delimiters even when I call them out as such.
Both
sort -t',' -n
andsort -t',' -nk1'
give me this:1234,31 200,04071 6492,8361 100,00869184
Sorting by the default (no parameters) or using
sort -t','
gives me this:100,00869184 1234,31 200,04071 6492,8361
And sorting as a number
sort -n
gives me this:1234,31 200,04071 6492,8361 100,00869184
How can I use sort to achieve my desired result?
Edited to add: This is for a one-time operation to create a sorted list of about 7 million lines, so workarounds or other unorthodox methods are perfectly acceptable.
-
dpatchery almost 12 yearsIsn't
-n
numeric sorting? -g gives me an illegal option. -
slhck almost 12 years
-g
is thegeneral-numeric-sort
option and should actually be available in any recent version ofsort
. @dpatchery -
dpatchery almost 12 yearsThis is at my place of work so I almost definitely do not have a recent version :)
-
dpatchery almost 12 yearsI don't have access to GNU in my environment. Is it something I could easily get then remove when I am finished? HMU in chat if someone would like to help me do this... I'm quite the UNIX newbie.
-
slhck almost 12 yearsI'm pretty sure it's just a
locale
issue. But what'ssort --version
for you, actually? -
dpatchery almost 12 yearssort --version gives me an illegal argument.
--
commands haven't worked for me in the past either. I checked the man page and there's no version called out explicitly, but it does list "HP-UX 11i Version 2: August 2003" if that helps at all. My LC_NUMERIC is set to "C". -
slhck almost 12 yearsGerman locale for example would use
,
as a decimal separator. I've never used HP-UX though. -
slhck almost 12 yearsAh see, I would have suggested using a German locale or similar. Can't think of anything else right now without being able to test it or what tools you have available, since this is a pretty rare Unix version.
-
dpatchery almost 12 years@slhck That seems to be the crux of most of the problems I get stuck on in UNIX :) Thanks for your help in getting me to a solution, regardless.