Union of multiple sets in python

python list python-3.x set-union

45,005

Solution 1

The itertools module makes short work of this problem:

>>> from itertools import chain
>>> list(set(chain.from_iterable(d)))
[1, '41', '42', '43', '40', '34', '30', '44']

Another way to do it is to unpack the list into separate arguments for union():

>>> list(set().union(*d))
[1, '41', '42', '43', '40', '34', '30', '44']

The latter way eliminates all duplicates and doesn't require that the inputs first be converted to sets. Also, it doesn't require an import.

Solution 2

Using the unpacking operator *:

>> list(set().union(*a))
[1, '44', '30', '42', '43', '40', '41', '34']

(Thanks Raymond Hettinger and ShadowRanger for the comments!)

(Note that

set.union(*tup)

will unpack to

set.union(tup[0], tup[1], ... tup[n - 1])

)

Solution 3

In [20]: s
Out[20]: 
[[1, '34', '44'],
 [1, '40', '30', '41'],
 [1, '41', '40', '42'],
 [1, '42', '41', '43'],
 [1, '43', '42', '44'],
 [1, '44', '34', '43']]
In [31]: list({x for _list in s for x in _list})
Out[31]: [1, '44', '30', '42', '43', '40', '41', '34']

Update:

Thanks for the comments

Solution 4

>>> big = [[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
>>> set(reduce ( lambda l,a : l + a, big))
set([1, '44', '30', '42', '43', '40', '41', '34'])

And if you really want a list of a list as a final result

>>>>[list(set(reduce ( lambda l,a : l + a, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

And if you don't like recoding a lambda function for the list addition :

>>>>[list(set(reduce ( list.__add__, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

EDIT : after your recommendation about using itertools.chain instead of list.__add__ I ran a timeit for both with the original variable used by the original poster.

It seems that timeit times list.__add__ around 2.8s and itertools.chain around 3.5 seconds.

I checked on this page and yes, you were right with the itertools.chain contains a from_iterable method that grants a huge performance boost. see below with list.__add__, itertools.chain and itertools.chain.from_iterable.

>>> timeit.timeit("[list(set(reduce ( list.__add__, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
16.051744650801993
>>> timeit.timeit("[list(set(reduce ( itertools.chain, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
54.721315866467194
>>> timeit.timeit("list(set(itertools.chain.from_iterable(big)))", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
0.040056066849501804

Thank you very much for your advises :)

Solution 5

You can use itertools to perform this action. Let us assume that your list has a variable name A

import itertools

single_list_with_all_values = list(itertools.chain(*A))
single_list_with_all_values.sort()

print set(single_list_with_all_values)

View more solutions

45,005

Author by

Tapojyoti Mandal

Masters Student at Texas A&M University. Interests in Computer Architecture, VLSI and Memory Storage Systems.

Updated on July 09, 2022

Comments

Tapojyoti Mandal almost 2 years
```
[[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
```
I have a list of lists. My aim is to check whether any one sublist has anything in common with other sublists(excluding the first index object to compare). If it has anything in common then unify those sublists.

For example, for this example my final answer should be something like:
```
[[1, '34, '44', '40' '30', '41', '42', '43']]
```
I can understand that I should convert the sublists to sets and then use union() and intersection() operation. But what I am stuck with is to how to compare each set/sublist. I can't run a loop over the list and compare each sublist one by one as the contents of the list would be modified and this would lead to error.

What I want to know is there any efficient method to compare all the sublists(converted to sets) and get union of them?
Peter Wood about 9 years

You don't need the list comprehension as the set constructor can take a generator.
Ajay about 9 years

@PeterWood OP has asked for a list to be his final answer
Tapojyoti Mandal about 9 years

This method works fine. Thanks. Could you explain what is the use of '*' in the code? Or maybe provide a link where I could study related to this to understand more.
Peter Wood about 9 years

No, the comprehension is being passed to the set. You don't need it.
Raymond Hettinger about 9 years

This is pretty good. A few improvements though. 1) A chain(*it) should always be changed to chain.from_iterable(it). 2) There is no need to sort() because the ordering is lost in making the set. 3) Without the sort, there is no need to convert to a list before making the set. With those changes, it boils down to set(chain.from_iterable(d)).
Raymond Hettinger about 9 years

FWIW, the tuple step has no effect because star-unpacking works on any iterable. You could also replace the list comprehension with map(set, a). The result boils down to list(set.union(*map(set, a))).
Ami Tavory about 9 years

@TapojyotiMandal See explanation in answer.
Raymond Hettinger about 9 years

Adding lists together like this is an inefficient O(n**2) operation and is almost always a bad idea. Please use itertools.chain instead.
Raymond Hettinger about 9 years

Replacing the list comprehension with a set comprehension boils this one down to a nice, clean answer: list({x for _list in s for x in _list}).
Hassan Baig almost 7 years

How does itertools generally scale? In your experience, can this kind of an operation handle tens or hundreds of millions item long lists ('items' being strings here)? Or even larger?
Raymond Hettinger almost 7 years

The chain.from_iterable() step is scale invariant. At any given time, its whole state is stored in just two pointers to iterators. The set() and list() parts eat memory in proportion to the total number of unique inputs. On my 64-bit machine, one hundred million unique inputs takes 4.3 GB of RAM for the set object and 0.9 GB for the list object.
ShadowRanger about 5 years

You could dramatically reduce the number of temporary sets by changing set.union(*map(set, a)) to set().union(*a). The only reason the map(set, was needed was because you were calling set.union and the first argument became the "self" it was being called on, but if you make an empty set as the base, union accepts arbitrary iterables for the remaining arguments.
Radio Controlled almost 4 years

You should better write set.union() as set() initializes with empty set. This is ok in this case but I spent a lot of time searching for bugs because I assumed this generalizes to intersection. With set. you can do both union and intersection!
user2357112 over 3 years

@RadioControlled: set().union(*d) works with an empty d, though, which is a more important factor than symmetry with what you would do for an intersection.
user2357112 over 3 years

Too bad all the answers are answering a different question from what was asked, though, probably due to the poor choice of example in the question. The question seems to have actually been asking for something closer to a hypergraph connected components algorithm, not just dumping everything into a single set. (dermen's answer is slightly different, but it ends up being even wronger.)