Identify groups of continuous numbers in a list

75,450

Solution 1

more_itertools.consecutive_groups was added in version 4.0.

Demo

import more_itertools as mit


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]

Code

Applying this tool, we make a generator function that finds ranges of consecutive numbers.

def find_ranges(iterable):
    """Yield range of consecutive numbers."""
    for group in mit.consecutive_groups(iterable):
        group = list(group)
        if len(group) == 1:
            yield group[0]
        else:
            yield group[0], group[-1]


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]

The source implementation emulates a classic recipe (as demonstrated by @Nadia Alramli).

Note: more_itertools is a third-party package installable via pip install more_itertools.

Solution 2

EDIT 2: To answer the OP new requirement

ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
    group = map(itemgetter(1), group)
    if len(group) > 1:
        ranges.append(xrange(group[0], group[-1]))
    else:
        ranges.append(group[0])

Output:

[xrange(2, 5), xrange(12, 17), 20]

You can replace xrange with range or any other custom class.


Python docs have a very neat recipe for this:

from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    print(map(itemgetter(1), g))

Output:

[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]

If you want to get the exact same output, you can do this:

ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    group = map(itemgetter(1), g)
    ranges.append((group[0], group[-1]))

output:

[(2, 5), (12, 17)]

EDIT: The example is already explained in the documentation but maybe I should explain it more:

The key to the solution is differencing with a range so that consecutive numbers all appear in same group.

If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17] Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:

groupby(
    [(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
    (5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
    lambda (i,x):i-x
)

The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:

[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]

groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.

I hope this makes it more readable.

python 3 version may be helpful for beginners

import the libraries required first

from itertools import groupby
from operator import itemgetter

ranges =[]

for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
    group = (map(itemgetter(1),g))
    group = list(map(int,group))
    ranges.append((group[0],group[-1]))

Solution 3

The "naive" solution which I find somewhat readable atleast.

x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]

def group(L):
    first = last = L[0]
    for n in L[1:]:
        if n - 1 == last: # Part of the group, bump the end
            last = n
        else: # Not part of the group, yield current group and start a new
            yield first, last
            first = last = n
    yield first, last # Yield the last group


>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]

Solution 4

Assuming your list is sorted:

>>> from itertools import groupby
>>> def ranges(lst):
    pos = (j - i for i, j in enumerate(lst))
    t = 0
    for i, els in groupby(pos):
        l = len(list(els))
        el = lst[t]
        t += l
        yield range(el, el+l)


>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]

Solution 5

Here it is something that should work, without any import needed:

def myfunc(lst):
    ret = []
    a = b = lst[0]                           # a and b are range's bounds

    for el in lst[1:]:
        if el == b+1: 
            b = el                           # range grows
        else:                                # range ended
            ret.append(a if a==b else (a,b)) # is a single or a range?
            a = b = el                       # let's start again with a single
    ret.append(a if a==b else (a,b))         # corner case for last single/range
    return ret
Share:
75,450
mikemaccana
Author by

mikemaccana

I help verify websites for EV HTTPS at CertSimple and have made a bunch of tech products in the past 20 years as a product manager, CTO, lead developer, systems engineer, and technical architect - see https://mikemaccana.com

Updated on July 08, 2022

Comments

  • mikemaccana
    mikemaccana almost 2 years

    I'd like to identify groups of continuous numbers in a list, so that:

    myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
    

    Returns:

    [(2,5), (12,17), 20]
    

    And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).

    Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.

  • Jochen Ritzel
    Jochen Ritzel about 14 years
    [j - i for i, j in enumerate(lst)] is clever :-)
  • SilentGhost
    SilentGhost about 14 years
    almost works in py3k, except it requires lambda x:x[0]-x[1].
  • mikemaccana
    mikemaccana about 14 years
    I like this answer a lot because it's terse yet readable. However numbers that are outside of ranges should be printed as single digits, not tuples (as I will format the output and have different formatting requirements for individual numbers versus ranges of numbers.
  • SilentGhost
    SilentGhost about 14 years
    >>> getranges([2, 12, 13]) Outputs: [[12, 13]]. Was that intentional?
  • mikemaccana
    mikemaccana about 14 years
    Yep, I need to fix for individual numbers (per most of the answers on the page). Working on it now.
  • mikemaccana
    mikemaccana about 14 years
    Could you use please use multi-character variable names? For someone not familiar with map() or groupby(), the meanings of k g, i and x are not clear.
  • Nadia Alramli
    Nadia Alramli about 14 years
    This was copied from the Python documentations with the same variable names. I changed the names now.
  • mikemaccana
    mikemaccana about 14 years
    Thanks for the improved variable names and handling non-ranged numbers. This is readable, your explanations are great and I've marked this as the preferred answer.
  • mikemaccana
    mikemaccana about 14 years
    Actually I prefer Nadia's answer, groupby() seems like the standard function I wanted.
  • Benny
    Benny about 11 years
    The other answer looked beautiful and intelligent, but this one is more understandable to me and allowed a beginner like me to expand it according to my needs.
  • IceArdor
    IceArdor almost 10 years
    You'll need to increment the 2nd number in xrange/range because it is non-inclusive. In other words, [2,3,4,5] == xrange(2,6), not xrange(2,5). It may be worth defining a new inclusive range data type.
  • derek73
    derek73 over 7 years
    Python 3 throws a syntax error on the first example. Here's the first 2 lines updated to work on python 3: for key, group in groupby(enumerate(data), lambda i: i[0] - i[1]): group = list(map(itemgetter(1), group))
  • Nexus
    Nexus almost 6 years
    Could use a list comprehension to print the non-range tuples as single digits: print([i if i[0] != i[1] else i[0] for i in group(x)])
  • Pleastry
    Pleastry almost 3 years
    This actually fails if you replace 12 with 10 in data array. The correct solution would be: starts = [x for x in data if x-1 not in data and x+1 in data] and ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]
  • kmt
    kmt almost 3 years
    Thanks @Pleastry - I have edited with your fix
  • Stef
    Stef over 2 years
    Alternatively, using more_itertools.groupby_transform: [v for k,v in more_itertools.groupby_transform(enumerate(iterable), keyfunc=lambda p: p[1]-p[0], valuefunc=operator.itemgetter(1), reducefunc=to_interval)] with to_interval = lambda g: (sublst[0], sublst[-1]) if len(sublst := list(g)) > 1 else sublst[0]