Sorting multiple keys with Unix sort
Solution 1
Use the -k
option (or --key=POS1[,POS2]
). It can appear multiple times and each key can have global options (such as n
for numeric sort)
Solution 2
Take care though:
If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:
sort -k 3,3 -k 2,2 < inputfile
Not this: sort -k 3 -k 2 < inputfile
which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).
-k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2
(default end of line)
Solution 3
The -k option is what you want.
-k 1.4,1.5n -k 1.14,1.15n
Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.
The second key would be characters 14-15 in the first field also.
(edit)
Example (all I have is DOS/cygwin handy):
dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r
for the data:
12/10/2008 01:10 PM 1,564,990 outfile.txt
Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.
Solution 4
Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order
~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga
Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110
~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga
Solution 5
I believe in your case something like
sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile
will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.
Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.
Chris Kloberdanz
Updated on January 21, 2020Comments
-
Chris Kloberdanz over 4 years
I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.
Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?
Note: I have ruled out Perl because of the file size potential. It would be a last resort.
-
Ken Gentle over 15 yearsOne or two lines of example data would be really helpful for to create example command line. Also, does "1-n" keys mean that you need to sort by a variable number of keys? Doing that without scripting is gonna be fun...
-
Chris Kloberdanz over 15 yearsI have a PHP wrapper around the sort command to enable the 1-n feature.
-
-
Adam Rosenfield over 15 yearsFrom the sort man page: "POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1." See man page for full documentation.
-
Jonathan Leffler over 15 yearsIt is only one field if there are no blanks in the input data. Nevertheless, your example is useful.
-
Clinton Pierce over 15 yearsCorrection: if there are no /tabs/ in the input data. In DOS's 'dir' command output, there are no tabs.
-
ron almost 13 yearsAlso see andras's answer if you don't want to get insane.
-
Ken Gentle over 11 yearsBoth comments above are accurate and additive. Thanks, gentlemen.
-
mat kelcey over 11 yearsLC_ALL=C can also result in quite a speedup!
-
msb over 10 yearsThe examples on how to use the options (numeric, reverse) are extremely helpful, as it's nearly impossible to find out how to use just from the man page and the other answers didn't mention it. I wish I could +2 for this. ;)
-
davidtbernal almost 10 yearsLife changing. Thanks.
-
Wildcard over 8 yearsWhoops! Now I have to fix a script because earlier I only saw the first answer above...good thing I haven't depended on the script output yet....
-
xaxa over 8 yearsThis is the best answer because it shows how to use different switches for different columns
-
Arun over 7 yearsNice! Now, what if I want fleld 3 to be numerically and reverse sorted whereas field 2 to be non-numerically and normal (ascending) sorted? :)
-
andras almost 7 years@Arun POS is explained at the end of the man page. You just append the ordering options to the field number like this:
sort -k 3,3nr -k 2,2
-
android.weasel over 6 yearsAargh. What a counterintuitive interface:
-k2
should be-k2,2
and a trailing comma-k2,
should be 'magical default end of line or whatever'. -
BaseZen about 6 yearsMy heavens. The man page writer won a contest for the least helpful way to document this. I've been reading Unix man pages for 28 years. Nowhere do they mention the -k field can be repeated.
-
Brad Dre over 4 yearsEven though the default separator accordinding to docs gnu.org/software/coreutils/manual/html_node/… is space, sometimes the field count is not what you'd expect. Perhaps as others have said here because of the LC_CTYPE locale setting. When in doubt count from the beginning of the line!
-
HongboZhu almost 4 yearswhy the angle bracket
<
? Shouldsort -k3,3 -k2,2 inputfile
not do the job?