Sort an associative array in awk
Solution 1
Instead of asort, use asorti(source, destination)
which sorts the indices into a new array and you won't have to copy the array.
Then you can use the destination array as pointers into the source array.
For your example, you would use it like this:
n=asorti(chr_count, sorted)
for (i=1; i<=n; i++) {
print sorted[i] " : " chr_count[sorted[i]]
}
Solution 2
you can use the sort command. e.g.
for ( i in data )
print i ":", data[i] | "sort"
Solution 3
I recently came across this issue and found that with gawk I could set the value of PROCINFO["sorted_in"]
to control iteration order. I found a list of valid values for this by searching for PROCINFO online and landed on this GNU Awk User's Guide page: https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html
This lists options of the form @{ind|val}_{num|type|str}_{asc|desc}
with:
-
ind
sorting by key (index) andval
sorting by value. -
num
sorting numerically,str
by string andtype
by assigned type. -
asc
for ascending order anddesc
for descending order.
I simply used:
PROCINFO["sorted_in"] = "@val_num_desc"
for (i in map) print i, map[i]
And the output was sorted in descending order of values.
Solution 4
Note that asort()
and asorti()
are specific to gawk, and are unknown to awk. For plain awk, you can roll your own sort()
or get one from elsewhere.
Solution 5
This is taken directly from the documentation:
populate the array data
# copy indices
j = 1
for (i in data) {
ind[j] = i # index value becomes element value
j++
}
n = asort(ind) # index values are now sorted
for (i = 1; i <= n; i++) {
do something with ind[i] Work with sorted indices directly
...
do something with data[ind[i]] Access original array via sorted indices
}
Related videos on Youtube
lonestar21
Updated on July 09, 2022Comments
-
lonestar21 almost 2 years
I have an associative array in awk that gets populated like this:
chr_count[$3]++
When I try to print my
chr_counts
, I use this:for (i in chr_count) { print i,":",chr_count[i]; }
But not surprisingly, the order of i is not sorted in any way. Is there an easy way to iterate over the sorted keys of
chr_count
?-
unhammer about 8 yearsSee stackoverflow.com/a/5345056/69663 – if you have gawk 4,
PROCINFO["sorted_in"] = "@val_num_asc"
etc. are very simple to use. The manual shows a lot of different options if you want descending/ascending, by value/key, numerically/stringually, your own function etc: gnu.org/software/gawk/manual/html_node/Controlling-Scanning
-
-
Cascabel over 14 yearsWow, totally forgot about that despite reading right past it in the docs. This is definitely the better answer.
-
Cristian Ciupitu almost 10 years
asorti
doesn't work with nawk-20121220-2.fc20.x86_64. -
SourceSeeker almost 10 years@CristianCiupitu: Sorry
asorti
is GAWK-specific. In fact, I don't thinknawk
has any built-in sort functions. -
Cristian Ciupitu almost 10 yearsGNU Awk's documentation mentions that indeed: "asort() and asorti() are gawk extensions; they are not available in compatibility mode (see Options)".
-
haridsv over 8 yearsWatch out, this solution is flawed as this ends up losing keys that have the same values in the original array. The accepted solution from this other thread has an idea on how to workaround that: stackoverflow.com/a/5345056/95750
-
Cascabel over 8 years@haridsv No, I don't think so. This question is about sorting by the keys, not the values, and there can't be two values for the same key, so there's no issue here. The other question you point to is about sorting by values (which indeed may not all be distinct), so if you tried to use this code for that, it'd be a problem. But this isn't flawed if you use it for what it's written for.
-
haridsv over 8 yearsApologies.. I misread the indexing code as "flipping" key/values, but after rereading it, I noticed that you are using a constantly increasing number as index, not the original value. Thank you for getting back and clarifying it.
-
Eugene Pakhomov about 2 yearsNote that this would work only for
gawk
-PROCINFO
is not something special forawk
itself.