Sort an associative array in awk

36,266

Solution 1

Instead of asort, use asorti(source, destination) which sorts the indices into a new array and you won't have to copy the array.

Then you can use the destination array as pointers into the source array.

For your example, you would use it like this:

n=asorti(chr_count, sorted)
for (i=1; i<=n; i++) {
        print sorted[i] " : " chr_count[sorted[i]]
}

Solution 2

you can use the sort command. e.g.

for ( i in data )
 print i ":", data[i]  | "sort"

Solution 3

I recently came across this issue and found that with gawk I could set the value of PROCINFO["sorted_in"] to control iteration order. I found a list of valid values for this by searching for PROCINFO online and landed on this GNU Awk User's Guide page: https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html

This lists options of the form @{ind|val}_{num|type|str}_{asc|desc} with:

  • ind sorting by key (index) and val sorting by value.
  • num sorting numerically, str by string and type by assigned type.
  • asc for ascending order and desc for descending order.

I simply used:

PROCINFO["sorted_in"] = "@val_num_desc"
for (i in map) print i, map[i]

And the output was sorted in descending order of values.

Solution 4

Note that asort() and asorti() are specific to gawk, and are unknown to awk. For plain awk, you can roll your own sort() or get one from elsewhere.

Solution 5

This is taken directly from the documentation:

 populate the array data
 # copy indices
 j = 1
 for (i in data) {
     ind[j] = i    # index value becomes element value
     j++
 }
 n = asort(ind)    # index values are now sorted
 for (i = 1; i <= n; i++) {
     do something with ind[i]           Work with sorted indices directly
     ...
     do something with data[ind[i]]     Access original array via sorted indices
 }
Share:
36,266

Related videos on Youtube

lonestar21
Author by

lonestar21

Updated on July 09, 2022

Comments

  • lonestar21
    lonestar21 almost 2 years

    I have an associative array in awk that gets populated like this:

    chr_count[$3]++
    

    When I try to print my chr_counts, I use this:

    for (i in chr_count) {
        print i,":",chr_count[i];
    }
    

    But not surprisingly, the order of i is not sorted in any way. Is there an easy way to iterate over the sorted keys of chr_count?

  • Cascabel
    Cascabel over 14 years
    Wow, totally forgot about that despite reading right past it in the docs. This is definitely the better answer.
  • Cristian Ciupitu
    Cristian Ciupitu almost 10 years
    asorti doesn't work with nawk-20121220-2.fc20.x86_64.
  • SourceSeeker
    SourceSeeker almost 10 years
    @CristianCiupitu: Sorry asorti is GAWK-specific. In fact, I don't think nawk has any built-in sort functions.
  • Cristian Ciupitu
    Cristian Ciupitu almost 10 years
    GNU Awk's documentation mentions that indeed: "asort() and asorti() are gawk extensions; they are not available in compatibility mode (see Options)".
  • haridsv
    haridsv over 8 years
    Watch out, this solution is flawed as this ends up losing keys that have the same values in the original array. The accepted solution from this other thread has an idea on how to workaround that: stackoverflow.com/a/5345056/95750
  • Cascabel
    Cascabel over 8 years
    @haridsv No, I don't think so. This question is about sorting by the keys, not the values, and there can't be two values for the same key, so there's no issue here. The other question you point to is about sorting by values (which indeed may not all be distinct), so if you tried to use this code for that, it'd be a problem. But this isn't flawed if you use it for what it's written for.
  • haridsv
    haridsv over 8 years
    Apologies.. I misread the indexing code as "flipping" key/values, but after rereading it, I noticed that you are using a constantly increasing number as index, not the original value. Thank you for getting back and clarifying it.
  • Eugene Pakhomov
    Eugene Pakhomov about 2 years
    Note that this would work only for gawk - PROCINFO is not something special for awk itself.