The number of occurences of elements in a vector [JULIA]

12,248

Solution 1

y=rand(1:10,20)
u=unique(y)
d=Dict([(i,count(x->x==i,y)) for i in u])
println("count for 10 is $(d[10])")

Solution 2

To remove the NaN values you can use the filter function. From the Julia docs:

filter(function, collection)

Return a copy of collection, removing elements for which function is false.

x = filter(y->!isnan(y),y)
filter!(y->!isnan(y),y)

Thus, we create as our function the conditional !isnan(y) and use it to filter the array y (note, we could also have written filter(z->!isnan(z),y) using z or any other variable we chose, since the first argument of filter is just defining an inline function). Note, we can either then save this as a new object or use the modify in place version, signaled by the ! in order to simply modify the existing object y

Then, either before or after this, depending on whether we want to include the NaNs in our count, we can use the countmap() function from StatsBase. From the Julia docs:

countmap(x)

Return a dictionary mapping each unique value in x to its number of occurrences.

using StatsBase
a = countmap(y)

you can then access specific elements of this dictionary, e.g. a[-1] will tell you how many occurrences there are of -1

Or, if you wanted to then convert that dictionary to an Array, you could use:

b = hcat([[key, val] for (key, val) in a]...)'

Note: Thanks to @JeffBezanon for comments on correct method for filtering NaN values.

Solution 3

countmap is the best solution I've seen so far, but here's a written out version, which is only slightly slower. It only passes over the array once, so if you have many unique values, it is very efficient:

function countmemb1(y)
    d = Dict{Int, Int}()
    for val in y
        if isnan(val)
            continue
        end
        if val in keys(d)
            d[val] += 1
        else
            d[val] = 1
        end
    end
    return d
end

The solution in the accepted answer can be a bit faster if there are a very small number of unique values, but otherwise scales poorly.

Edit: Because I just couldn't leave well enough alone, here's a version that is more generic and also faster (countmap doesn't accept strings, sets or tuples, for example):

function countmemb(itr)
    d = Dict{eltype(itr), Int}()
    for val in itr
        if isa(val, Number) && isnan(val)
            continue
        end
        d[val] = get(d, val, 0) + 1
    end
    return d
end
Share:
12,248
vincet
Author by

vincet

Updated on June 04, 2022

Comments

  • vincet
    vincet almost 2 years

    I have a vector of 2500 values composed of repeated values and NaN values. I want to remove all the NaN values and compute the number of occurrences of each other value.

    y
    2500-element Array{Int64,1}:
    8
    43
    NaN
    46
    NaN
    8
    8
    3
    46
    NaN
    

    For example: the number of occurences of 8 is 3 the number of occurences of 46 is 2 the number of occurences of 43 is 1.