Counting number of values between interval

19,004

Solution 1

you will have to iterate the list at least once.

The solution below works with any sequence/interval that implements comparision (<, >, etc) and uses bisect algorithm to find the correct point in the interval, so it is very fast.

It will work with floats, text, or whatever. Just pass a sequence and a list of the intervals.

from collections import defaultdict
from bisect import bisect_left

def count_intervals(sequence, intervals):
    count = defaultdict(int)
    intervals.sort()
    for item in sequence:
        pos = bisect_left(intervals, item)
        if pos == len(intervals):
            count[None] += 1
        else:
            count[intervals[pos]] += 1
    return count

data = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
print count_intervals(data, [10, 20])

Will print

defaultdict(<type 'int'>, {10: 10, 20: 9})

Meaning that you have 10 values <10 and 9 values <20.

Solution 2

I don't know how large your list will get but here's another approach.

import numpy as np
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
np.histogram(mylist, bins=[0,9,19])

Solution 3

You can also use a combination of value_counts() and pd.cut() to help you get the job done.

import pandas as pd   
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
split_mylist = pd.cut(mylist, [0, 9, 19]).value_counts(sort = False)
print(split_mylist)

This piece of code will return this:

(0, 10] 10 (10, 20] 9 dtype: int64

Then you can utilise the to_list() function to get what you want

split_mylist = split_mylist.tolist()
print(split_mylist)

Output: [10, 9]

Solution 4

If the numbers are integers, as in your example, representing the intervals as frozensets can perhaps be fastest (worth trying). Not sure if the intervals are guaranteed to be mutually exclusive -- if not, then

intervals = [frozenzet(range(10)), frozenset(range(10, 20))]
counts = [0] * len(intervals)

for n in mylist:
  for i, inter in enumerate(intervals):
    if n in inter:
      counts[i] += 1

if the intervals are mutually exclusive, this code could be sped up a bit by breaking out of the inner loop right after the increment. However for mutually exclusive intervals of integers >= 0, there's an even more attractive option: first, prepare an auxiliary index, e.g. given your startpoints data structure that could be

indices = [sum(i > x for x in startpoints) - 1 for i in range(max(startpoints))]

and then

counts = [0] * len(intervals)
for n in mylist:
  if 0 <= n < len(indices):
    counts[indices[n]] += 1

this can be adjusted if the intervals can be < 0 (everything needs to be offset by -min(startpoints) in that case.

If the "numbers" can be arbitrary floats (or decimal.Decimals, etc), not just integer, the possibilities for optimization are more restricted. Is that the case...?

Share:
19,004

Related videos on Youtube

calccrypto
Author by

calccrypto

Updated on June 04, 2022

Comments

  • calccrypto
    calccrypto almost 2 years

    Is there any efficient way in python to count the times an array of numbers is between certain intervals? the number of intervals i will be using may get quite large

    like:

    mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
    
    some function(mylist, startpoints):
       # startpoints = [0,10,20]
       count values in range [0,9]
       count values in range [10-19]
    
    output = [9,10]
    
  • calccrypto
    calccrypto almost 14 years
    cool! now how do i sort the values (and maintain the order of the interval endpoints)? i added a few values to mylist, and it came out as defaultdict(<type 'int'>, {40: 1, 10: 6, 30: 6}).
  • nosklo
    nosklo almost 14 years
    dicts don't have order. you can use something like: for k in sorted(mydict): print k, mydict[k]
  • nosklo
    nosklo almost 14 years
    or: for value in intervals: print value, result[value]
  • James Elder
    James Elder over 10 years
    Thanks for this answer! I was just wondering how can this be modified such that if there are no values in an interval the list will contain interval: 0. I.e. if we typed count_intervals(data,[10,20,30]) we get the result defaultdict(<type 'int'>, {10: 10, 20: 9, 30:0}).
  • nosklo
    nosklo over 10 years
    @JamesElder for value in intervals: result.setdefault(value, 0)