Sort with unequal whitespace in first column
Solution 1
As others have commented, it will make it easier to work with the data if it is comma separated values (CSV).
Here is my solution for converting the data to CSV:
$ cat file | sed 's/ \([0-9]\)/,\1/g'
Kuala Lumpur,78,56
Seoul,85,66
Karachi,95,75
Tokyo,85,60
Lahore,85,75
Manila,90,85
It replaces any space preceding a digit with a comma. \1
references the group ([0-9]), the digit after the space. From there you can use sort
with the -t
argument to specify a field separator.
$ cat file | sed 's/ \([0-9]\)/,\1/g' | sort -t, -k2
Kuala Lumpur,78,56
Tokyo,85,60
Seoul,85,66
Lahore,85,75
Manila,90,85
Karachi,95,75
If you'd like to convert back to spaces or make a table, here are two examples:
$ cat test | sed 's/ \([0-9]\)/,\1/g' | sort -t, -k2 | tr , ' '
Kuala Lumpur 78 56
Tokyo 85 60
Seoul 85 66
Lahore 85 75
Manila 90 85
Karachi 95 75
$ cat test | sed 's/ \([0-9]\)/,\1/g' | sort -t, -k2 | column -s, -t
Kuala Lumpur 78 56
Tokyo 85 60
Seoul 85 66
Lahore 85 75
Manila 90 85
Karachi 95 75
Solution 2
awk '{print $NF,$0}' file.txt | sort -nr -k1 | cut -d' ' -f2-
-
$NF
: number of fields,$0
: whole line -
sort -nr
: numeric reverse (descending) -
sort -k1
: sort by first column (delimited by sequences of spaces and tabs) -
cut -d
: delimiter (defaults to tab) -
cut -f2-
: fields 2 to last (doesn't collapse or strip delimiters)
ruby -e 'puts readlines.sort_by{|l|l.split[-1].to_i}.reverse' file.txt
-
readlines
=ARGF.readlines
-
split
splits on whitespace by default
Solution 3
If you can change your file to have tab separated columns, your life will be easier. If changing the file is not an option, this Perl one-liner will do it for you:
perl -ne 's/\s+/\t/g; s/([a-z])\s([a-z])/$1 $2/ig; s/\t$/\n/; print;' file |
sort -t$'\t' -nk3
Kuala Lumpur 78 56
Tokyo 85 60
Seoul 85 66
Karachi 95 75
Lahore 85 75
Manila 90 85
EXPLANATION:
-
s/\s+/\t/g
: change ALL spaces to TABs. -
s/([a-z])\s([a-z])/$1 $2/ig
: change TABs that are between two letters (no numbers) back to single spaces. s/\t$/\n/
: the first substitution introduces a TAB at the end of each line, change that back to a newline character (\n
).sort -t$'\t' -nk3
: use TAB as a delimiter (sort has a strange syntax there, I know, see here for more information) and sort numerically on the third column.
Related videos on Youtube
Comments
-
healix almost 2 years
I need to sort this list by name, high temp and low temp:
Kuala Lumpur 78 56 Seoul 85 66 Karachi 95 75 Tokyo 85 60 Lahore 85 75 Manila 90 85
I figured since whitespace is the delimiter for a column I could just
sort -k 1
which gives me this:Karachi 95 75 Kuala Lumpur 78 56 Lahore 85 75 Manila 90 85 Seoul 85 66 Tokyo 85 60
But the "Kuala Lumpur" is causing problems because of the space.
So I tried to treat "Lumpur" as a column and to sort the first set of nums I did
sort -k 3n
but I get this:
Tokyo 85 60 Seoul 85 66 Karachi 95 75 Lahore 85 75 Kuala Lumpur 78 56 <---Why is this out of order? Manila 90 85
How do I deal with this one space?
-
Bernhard about 11 yearsHow is the file generated? You should use a different separator.
-
healix about 11 yearsthe file was generated with cat. What different separator? I was thinking of using tr and removing the space
-
Bernhard about 11 yearsA comma
,
would be a decent separator here. I don't see howcat
would generate this file. There is something behind, right? -
terdon about 11 yearsCould you post the desired output? Sorting alphabetically seems to work in your example. The only problem seems to be sorting by temp, is that correct?
-
healix about 11 years@terdon Correct. When the temp is sorted the "Kuala Lumpur" doesn't sort properly.
-
healix about 11 years@Bernhard I didn't want to use a comma. I tried tabs after the names for the nums but didn't help much.
-
terdon about 11 yearsSo changing the delimiter is an option? In that case, you should be fine. Just add a tab between the name and the numbers and between each temperature.
-
healix about 11 years@terdon I'm having a problem with that. When I do sort -t' '... and put a tab in with CTR-V then TAB I don't think its reading in the tab. I tried <TAB> as well
-
terdon about 11 yearsSee my answer for how to use sort with tabs.
-
-
chris over 2 yearsI like that this answer addresses the OP without merely saying that if the OP had a different format, the problem would be easier. However, this did not work when I tried it with
/proc/filesystems
, but the answer from @Lri did./proc/filesystems
may be a special case because the first field is either present or missing, but it fits the pattern of trying to sort on a field that appears after a variable amount of whitespace characters. As an example mistake, the output has<whitespace> btrfs
as the first entry, butnodev autofs
is further down the list. -
chris over 2 yearsI really like this solution. I was going to use
awk
, but not in such an elegant way. Thank you from 8.5 years in the future. -
terdon over 2 years@chris that's a completely different situation since you have empty fields, so
sort
cannot find a "second" field to sort on. You might want to ask a new question, but I think this should do what you probably want:sed 's/^\t/XXX\t/' /proc/filesystems | sort -k2,2 | sed 's/XXX//'
. -
chris over 2 yearsThanks, terdon. That's a fair criticism. However, I view them as the same class of problem, because "Kuala Lumpur" essentially has a field in it ("Lumpur") that the other rows do not. Also, thanks for the sed suggestion. I wound up using an answer based on @Lri's post. However, I like this one also. I'm going to keep both of them in my script for now (one in comments, both with attribution :) ) while I think about which one I like better for the general case of sorting based on a missing nth column. Yours may prove more straightforward for that.