Split a list of tuples into sub-lists of the same tuple field

11,920

Solution 1

Use itertools.groupby:

import itertools
import operator

data=[(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ]

for key,group in itertools.groupby(data,operator.itemgetter(1)):
    print(list(group))

yields

[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]

Or, to create one list with each group as a sublist, you could use a list comprehension:

[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]

The second argument to itertools.groupby is a function which itertools.groupby applies to each item in data (the first argument). It is expected to return a key. itertools.groupby then groups together all contiguous items with the same key.

operator.itemgetter(1) picks off the second item in a sequence.

For example, if

row=(1, 'A', 'foo')

then

operator.itemgetter(1)(row)

equals 'A'.


As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data first before applying itertools.groupby. This is because itertools.groupy only collects contiguous items with the same key into groups.

To sort the tuples by category, use:

data2=sorted(data,key=operator.itemgetter(1))

Solution 2

collections.defaultdict

itertools.groupby requires the input to be sorted by the key field, otherwise you will have to sort first, incurring O(n log n) cost. For guaranteed O(n) time complexity, you can use a defaultdict of lists:

from collections import defaultdict

dd = defaultdict(list)
for item in data:
    dd[item[1]].append(item)

res = list(dd.values())

print(res)

[[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')],
 [('xx', 'B', 'foobar'), ('yy', 'B', 'foo')],
 [(1000, 'C', 'py'), (200, 'C', 'foo')]]

Solution 3

To get multiple lists of singletons from a list of tuples:

foo = ((1,2), (3, 4), (5, 6), (7,8) , (9, 10))
[[z[i] for z in foo] for i in (0,1)]

If you prefer to get multiple tuples of singletons:

zip(*[(1,4),(2,5),(3,6)])
Share:
11,920
Kaung Htet
Author by

Kaung Htet

Updated on June 19, 2022

Comments

  • Kaung Htet
    Kaung Htet almost 2 years

    I have a huge list of tuples in this format. The second field of the each tuple is the category field.

        [(1, 'A', 'foo'),
        (2, 'A', 'bar'),
        (100, 'A', 'foo-bar'),
    
        ('xx', 'B', 'foobar'),
        ('yy', 'B', 'foo'),
    
        (1000, 'C', 'py'),
        (200, 'C', 'foo'),
        ..]
    

    What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?

  • Eryk Sun
    Eryk Sun over 12 years
    Don't forget that the data first has to be sorted: data2 = sorted(data, key=operator.itemgetter(1)).
  • jwg
    jwg almost 11 years
    Great answer, don't forget that you can use a lambda instead of the operator, for folks who are used to lambdas.