Counting number of values between interval

python arrays count

19,004

Solution 1

you will have to iterate the list at least once.

The solution below works with any sequence/interval that implements comparision (<, >, etc) and uses bisect algorithm to find the correct point in the interval, so it is very fast.

It will work with floats, text, or whatever. Just pass a sequence and a list of the intervals.

from collections import defaultdict
from bisect import bisect_left

def count_intervals(sequence, intervals):
    count = defaultdict(int)
    intervals.sort()
    for item in sequence:
        pos = bisect_left(intervals, item)
        if pos == len(intervals):
            count[None] += 1
        else:
            count[intervals[pos]] += 1
    return count

data = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
print count_intervals(data, [10, 20])

Will print

defaultdict(<type 'int'>, {10: 10, 20: 9})

Meaning that you have 10 values <10 and 9 values <20.

Solution 2

I don't know how large your list will get but here's another approach.

import numpy as np
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
np.histogram(mylist, bins=[0,9,19])

Solution 3

You can also use a combination of value_counts() and pd.cut() to help you get the job done.

import pandas as pd   
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
split_mylist = pd.cut(mylist, [0, 9, 19]).value_counts(sort = False)
print(split_mylist)

This piece of code will return this:

(0, 10] 10 (10, 20] 9 dtype: int64

Then you can utilise the to_list() function to get what you want

split_mylist = split_mylist.tolist()
print(split_mylist)

Output: [10, 9]

Solution 4

If the numbers are integers, as in your example, representing the intervals as frozensets can perhaps be fastest (worth trying). Not sure if the intervals are guaranteed to be mutually exclusive -- if not, then

intervals = [frozenzet(range(10)), frozenset(range(10, 20))]
counts = [0] * len(intervals)

for n in mylist:
  for i, inter in enumerate(intervals):
    if n in inter:
      counts[i] += 1

if the intervals are mutually exclusive, this code could be sped up a bit by breaking out of the inner loop right after the increment. However for mutually exclusive intervals of integers >= 0, there's an even more attractive option: first, prepare an auxiliary index, e.g. given your startpoints data structure that could be

indices = [sum(i > x for x in startpoints) - 1 for i in range(max(startpoints))]

and then

counts = [0] * len(intervals)
for n in mylist:
  if 0 <= n < len(indices):
    counts[indices[n]] += 1

this can be adjusted if the intervals can be < 0 (everything needs to be offset by -min(startpoints) in that case.

If the "numbers" can be arbitrary floats (or decimal.Decimals, etc), not just integer, the possibilities for optimization are more restricted. Is that the case...?

View more solutions

19,004

calccrypto

Updated on June 04, 2022

Comments

calccrypto almost 2 years

Is there any efficient way in python to count the times an array of numbers is between certain intervals? the number of intervals i will be using may get quite large

like:

mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]

some function(mylist, startpoints):
   # startpoints = [0,10,20]
   count values in range [0,9]
   count values in range [10-19]

output = [9,10]

calccrypto almost 14 years

cool! now how do i sort the values (and maintain the order of the interval endpoints)? i added a few values to mylist, and it came out as defaultdict(<type 'int'>, {40: 1, 10: 6, 30: 6}).
nosklo almost 14 years

dicts don't have order. you can use something like: for k in sorted(mydict): print k, mydict[k]
nosklo almost 14 years

or: for value in intervals: print value, result[value]
James Elder over 10 years

Thanks for this answer! I was just wondering how can this be modified such that if there are no values in an interval the list will contain interval: 0. I.e. if we typed count_intervals(data,[10,20,30]) we get the result defaultdict(<type 'int'>, {10: 10, 20: 9, 30:0}).
nosklo over 10 years

@JamesElder for value in intervals: result.setdefault(value, 0)