Union of multiple sets in python
Solution 1
The itertools module makes short work of this problem:
>>> from itertools import chain
>>> list(set(chain.from_iterable(d)))
[1, '41', '42', '43', '40', '34', '30', '44']
Another way to do it is to unpack the list into separate arguments for union():
>>> list(set().union(*d))
[1, '41', '42', '43', '40', '34', '30', '44']
The latter way eliminates all duplicates and doesn't require that the inputs first be converted to sets. Also, it doesn't require an import.
Solution 2
Using the unpacking operator *
:
>> list(set().union(*a))
[1, '44', '30', '42', '43', '40', '41', '34']
(Thanks Raymond Hettinger and ShadowRanger for the comments!)
(Note that
set.union(*tup)
will unpack to
set.union(tup[0], tup[1], ... tup[n - 1])
)
Solution 3
In [20]: s
Out[20]:
[[1, '34', '44'],
[1, '40', '30', '41'],
[1, '41', '40', '42'],
[1, '42', '41', '43'],
[1, '43', '42', '44'],
[1, '44', '34', '43']]
In [31]: list({x for _list in s for x in _list})
Out[31]: [1, '44', '30', '42', '43', '40', '41', '34']
Update:
Thanks for the comments
Solution 4
>>> big = [[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
>>> set(reduce ( lambda l,a : l + a, big))
set([1, '44', '30', '42', '43', '40', '41', '34'])
And if you really want a list of a list as a final result
>>>>[list(set(reduce ( lambda l,a : l + a, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]
And if you don't like recoding a lambda function for the list addition :
>>>>[list(set(reduce ( list.__add__, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]
EDIT : after your recommendation about using itertools.chain instead of list.__add__ I ran a timeit for both with the original variable used by the original poster.
It seems that timeit times list.__add__ around 2.8s and itertools.chain around 3.5 seconds.
I checked on this page and yes, you were right with the itertools.chain contains a from_iterable method that grants a huge performance boost. see below with list.__add__, itertools.chain and itertools.chain.from_iterable.
>>> timeit.timeit("[list(set(reduce ( list.__add__, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
16.051744650801993
>>> timeit.timeit("[list(set(reduce ( itertools.chain, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
54.721315866467194
>>> timeit.timeit("list(set(itertools.chain.from_iterable(big)))", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
0.040056066849501804
Thank you very much for your advises :)
Solution 5
You can use itertools to perform this action. Let us assume that your list has a variable name A
import itertools
single_list_with_all_values = list(itertools.chain(*A))
single_list_with_all_values.sort()
print set(single_list_with_all_values)
![Tapojyoti Mandal](https://lh6.googleusercontent.com/-K1gcEAMNSlM/AAAAAAAAAAI/AAAAAAAAADw/C1qBtMKY00g/photo.jpg?sz=256)
Tapojyoti Mandal
Masters Student at Texas A&M University. Interests in Computer Architecture, VLSI and Memory Storage Systems.
Updated on July 09, 2022Comments
-
Tapojyoti Mandal almost 2 years
[[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
I have a list of lists. My aim is to check whether any one sublist has anything in common with other sublists(excluding the first index object to compare). If it has anything in common then unify those sublists.
For example, for this example my final answer should be something like:
[[1, '34, '44', '40' '30', '41', '42', '43']]
I can understand that I should convert the sublists to sets and then use union() and intersection() operation. But what I am stuck with is to how to compare each set/sublist. I can't run a loop over the list and compare each sublist one by one as the contents of the list would be modified and this would lead to error.
What I want to know is there any efficient method to compare all the sublists(converted to sets) and get union of them?
-
Peter Wood about 9 yearsYou don't need the list comprehension as the set constructor can take a generator.
-
Ajay about 9 years@PeterWood OP has asked for a list to be his final answer
-
Tapojyoti Mandal about 9 yearsThis method works fine. Thanks. Could you explain what is the use of '*' in the code? Or maybe provide a link where I could study related to this to understand more.
-
Peter Wood about 9 yearsNo, the comprehension is being passed to the
set
. You don't need it. -
Raymond Hettinger about 9 yearsThis is pretty good. A few improvements though. 1) A
chain(*it)
should always be changed tochain.from_iterable(it)
. 2) There is no need tosort()
because the ordering is lost in making the set. 3) Without the sort, there is no need to convert to a list before making the set. With those changes, it boils down toset(chain.from_iterable(d))
. -
Raymond Hettinger about 9 yearsFWIW, the tuple step has no effect because star-unpacking works on any iterable. You could also replace the list comprehension with
map(set, a)
. The result boils down tolist(set.union(*map(set, a)))
. -
Ami Tavory about 9 years@TapojyotiMandal See explanation in answer.
-
Raymond Hettinger about 9 yearsAdding lists together like this is an inefficient O(n**2) operation and is almost always a bad idea. Please use
itertools.chain
instead. -
Raymond Hettinger about 9 yearsReplacing the list comprehension with a set comprehension boils this one down to a nice, clean answer:
list({x for _list in s for x in _list})
. -
Hassan Baig almost 7 yearsHow does itertools generally scale? In your experience, can this kind of an operation handle tens or hundreds of millions item long lists ('items' being strings here)? Or even larger?
-
Raymond Hettinger almost 7 yearsThe
chain.from_iterable()
step is scale invariant. At any given time, its whole state is stored in just two pointers to iterators. Theset()
andlist()
parts eat memory in proportion to the total number of unique inputs. On my 64-bit machine, one hundred million unique inputs takes 4.3 GB of RAM for the set object and 0.9 GB for the list object. -
ShadowRanger about 5 yearsYou could dramatically reduce the number of temporary
set
s by changingset.union(*map(set, a))
toset().union(*a)
. The only reason themap(set,
was needed was because you were callingset.union
and the first argument became the "self" it was being called on, but if you make an emptyset
as the base,union
accepts arbitrary iterables for the remaining arguments. -
Radio Controlled almost 4 yearsYou should better write
set.union()
asset()
initializes with empty set. This is ok in this case but I spent a lot of time searching for bugs because I assumed this generalizes to intersection. Withset.
you can do both union and intersection! -
user2357112 over 3 years@RadioControlled:
set().union(*d)
works with an emptyd
, though, which is a more important factor than symmetry with what you would do for an intersection. -
user2357112 over 3 yearsToo bad all the answers are answering a different question from what was asked, though, probably due to the poor choice of example in the question. The question seems to have actually been asking for something closer to a hypergraph connected components algorithm, not just dumping everything into a single set. (dermen's answer is slightly different, but it ends up being even wronger.)