Need to create a list of sets, from a list of sets whose members may be connected
Solution 1
It's hard to tell the performance without a sufficiently large set, but here is some basic code to start from:
while True:
merged_one = False
supersets = [listOfSets[0]]
for s in listOfSets[1:]:
in_super_set = False
for ss in supersets:
if s & ss:
ss |= s
merged_one = True
in_super_set = True
break
if not in_super_set:
supersets.append(s)
print supersets
if not merged_one:
break
listOfSets = supersets
This works in 3 iterations on the provided data. And the output is as follows:
[set([0, 1, 2, 3, 4, 5]), set([4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
Solution 2
This is a union find problem.
Though I haven't used it, this Python code looks good to me.
http://code.activestate.com/recipes/577225-union-find/
Solution 3
Forgive the messed up caps (autocorrect...):
# the results cotainer
Connected = set()
sets = # some list of sets
# convert the sets to frozensets (which are hashable and can be added to sets themselves)
Sets = map(frozenset, sets)
for s1 in sets:
Res = copy.copy(s1)
For s2 in sets:
If s1 & s2:
Res = res | s2
Connected.add(res)
andySnooker
Updated on June 18, 2022Comments
-
andySnooker about 2 years
I'm dealing with polygonal data in realtime here, but the problems quite simple. I have a huge list containing thousands of sets of polygon Indecies (Integers) and I need to simplify the list as "fast" as possible into a list of sets of "connected" Indecies. i.e. Any sets containing integers that are also in another set become one set in the result. I've read several possible solutions involving sets & graphs etc. All i'm after are a final list of sets which had any degree of commonality.
I'm dealing with lots of data here, but for simplicities sake here's some sample data:
setA = set([0,1,2]) setB = set([6,7,8,9]) setC = set([4,5,6]) setD = set([3,4,5,0]) setE = set([10,11,12]) setF = set([11,13,14,15]) setG = set([16,17,18,19]) listOfSets = [setA,setB,setC,setD,setE,setF,setG]
In this case I'm after a list with a result like this, although ordering is irrelevant:
connectedFacesListOfSets = [ set([0,1,2,3,4,5,6,7,8,9]), set([10,11,12,13,14,15]), set([16,17,18,19])]
I've looked for similar solutions, but the one with the highest votes gave incorrect results on my large test data.
-
andySnooker about 13 yearslambacck's seems succinct and works for me so I'm going with that answer unless there's a faster solution. In my list of 4064 sets it takes less than 0.01 secs so that'll do me fine.
-
Bryce Siedschlaw about 13 yearsI'm curious as to how you finished your task using lambaack's method in .01 seconds. I've been running it for the past 2 minutes and its still not finished yet. What was your smallest/largest beginning set? I'm using a randomly generated list of 4000 sets. Not that I'm knocking lambaack or anything (my answer probably takes longer), I was just wondering what you used.
-
lambacck about 13 years@Bryce Siedschlaw: What is your code for creating the sets? I'm interested to try on a larger group of sets.
-
Bryce Siedschlaw about 13 years@lambacck I added code to my answer to generate the random sets list
-
twneale about 13 yearsI ran each answer with timeit--fastest was lamback 0.03 (11 function calls), followed by Thiago 0.07 (50 calls), and Bryce 0.1 (69 calls).
-
lambacck about 13 years@twneale: be sure that if you did multiple iterations with timeit that you use a deep copy of listOfSets (
[set(x) for x in listOfSets]
) otherwise you are just re-running the already combined sets. -
Thiago Chaves about 13 years@twneale: And please remove the print statements since they take too much time. Thanks.
-
Thiago Chaves about 13 yearsI have tested lambacck's and my method now. lambacck's is 10x faster by my measurements. I guess all those indirections and control maps take too much lot of time.
-
twneale about 13 years@lambacck, Thiago: I ran timeit like this "t = timeit.Timer('thiago()', 'from main import thiago'); print t.repeat(number=1000). For each, everything was inside the function. I also removed all the prints.
-
-
andySnooker about 13 yearsThat looks hopeful on the small example data lambacck. I'll give it a go in my plugin tomorrow and see how it fairs. Cheers.
-
lambacck about 13 yearsIf your going for speed, you will want to remove the function calls, since they are a well known slowdown. Also searching for a non-connected set on each iteration of the
while
loop takes a long time, better to track that as you iterate and keep a flag. -
andySnooker about 13 yearsWorks a treat. Thankyou. If I had 15 rep I'd Vote You Up.
-
Thiago Chaves about 13 yearsThis algorithm runs only once and would give the wrong answer in many situations. For example, for the input [(1,2),(2,3),(3,4)] it will answer incorrectly [(1,2,3),(1,2,3,4),(2,3,4)]