Split a list of tuples into sub-lists of the same tuple field
Solution 1
Use itertools.groupby:
import itertools
import operator
data=[(1, 'A', 'foo'),
(2, 'A', 'bar'),
(100, 'A', 'foo-bar'),
('xx', 'B', 'foobar'),
('yy', 'B', 'foo'),
(1000, 'C', 'py'),
(200, 'C', 'foo'),
]
for key,group in itertools.groupby(data,operator.itemgetter(1)):
print(list(group))
yields
[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]
Or, to create one list with each group as a sublist, you could use a list comprehension:
[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]
The second argument to itertools.groupby
is a function which itertools.groupby
applies to each item in data
(the first argument). It is expected to return a key
. itertools.groupby
then groups together all contiguous items with the same key
.
operator.itemgetter(1) picks off the second item in a sequence.
For example, if
row=(1, 'A', 'foo')
then
operator.itemgetter(1)(row)
equals 'A'
.
As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data
first before applying itertools.groupby
. This is because itertools.groupy
only collects contiguous items with the same key into groups.
To sort the tuples by category, use:
data2=sorted(data,key=operator.itemgetter(1))
Solution 2
collections.defaultdict
itertools.groupby
requires the input to be sorted by the key field, otherwise you will have to sort first, incurring O(n log n) cost. For guaranteed O(n) time complexity, you can use a defaultdict
of lists:
from collections import defaultdict
dd = defaultdict(list)
for item in data:
dd[item[1]].append(item)
res = list(dd.values())
print(res)
[[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')],
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')],
[(1000, 'C', 'py'), (200, 'C', 'foo')]]
Solution 3
To get multiple lists of singletons from a list of tuples:
foo = ((1,2), (3, 4), (5, 6), (7,8) , (9, 10))
[[z[i] for z in foo] for i in (0,1)]
If you prefer to get multiple tuples of singletons:
zip(*[(1,4),(2,5),(3,6)])
Kaung Htet
Updated on June 19, 2022Comments
-
Kaung Htet almost 2 years
I have a huge list of tuples in this format. The second field of the each tuple is the category field.
[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar'), ('xx', 'B', 'foobar'), ('yy', 'B', 'foo'), (1000, 'C', 'py'), (200, 'C', 'foo'), ..]
What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?
-
Eryk Sun over 12 yearsDon't forget that the data first has to be sorted:
data2 = sorted(data, key=operator.itemgetter(1))
. -
jwg almost 11 yearsGreat answer, don't forget that you can use a lambda instead of the operator, for folks who are used to lambdas.