Need to create a list of sets, from a list of sets whose members may be connected

10,792

Solution 1

It's hard to tell the performance without a sufficiently large set, but here is some basic code to start from:

while True:
    merged_one = False
    supersets = [listOfSets[0]]

    for s in listOfSets[1:]:
        in_super_set = False
        for ss in supersets:
            if s & ss:
               ss |= s
               merged_one = True
               in_super_set = True
               break

        if not in_super_set:
            supersets.append(s)

    print supersets
    if not merged_one:
        break

    listOfSets = supersets       

This works in 3 iterations on the provided data. And the output is as follows:

[set([0, 1, 2, 3, 4, 5]), set([4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]

Solution 2

This is a union find problem.

Though I haven't used it, this Python code looks good to me.

http://code.activestate.com/recipes/577225-union-find/

Solution 3

Forgive the messed up caps (autocorrect...):

# the results cotainer
Connected = set()

sets = # some list of sets

# convert the sets to frozensets (which are hashable and can be added to sets themselves)
Sets = map(frozenset, sets)

for s1 in sets:
    Res = copy.copy(s1)
    For s2 in sets:
        If s1 & s2:
            Res = res | s2
    Connected.add(res)  
Share:
10,792
andySnooker
Author by

andySnooker

Updated on June 18, 2022

Comments

  • andySnooker
    andySnooker about 2 years

    I'm dealing with polygonal data in realtime here, but the problems quite simple. I have a huge list containing thousands of sets of polygon Indecies (Integers) and I need to simplify the list as "fast" as possible into a list of sets of "connected" Indecies. i.e. Any sets containing integers that are also in another set become one set in the result. I've read several possible solutions involving sets & graphs etc. All i'm after are a final list of sets which had any degree of commonality.

    I'm dealing with lots of data here, but for simplicities sake here's some sample data:

    setA = set([0,1,2])
    setB = set([6,7,8,9])
    setC = set([4,5,6])
    setD = set([3,4,5,0])
    setE = set([10,11,12])
    setF = set([11,13,14,15])
    setG = set([16,17,18,19])
    
    listOfSets = [setA,setB,setC,setD,setE,setF,setG]
    

    In this case I'm after a list with a result like this, although ordering is irrelevant:

    connectedFacesListOfSets = [ set([0,1,2,3,4,5,6,7,8,9]), set([10,11,12,13,14,15]), set([16,17,18,19])]

    I've looked for similar solutions, but the one with the highest votes gave incorrect results on my large test data.

    Merge lists that share common elements

    • andySnooker
      andySnooker about 13 years
      lambacck's seems succinct and works for me so I'm going with that answer unless there's a faster solution. In my list of 4064 sets it takes less than 0.01 secs so that'll do me fine.
    • Bryce Siedschlaw
      Bryce Siedschlaw about 13 years
      I'm curious as to how you finished your task using lambaack's method in .01 seconds. I've been running it for the past 2 minutes and its still not finished yet. What was your smallest/largest beginning set? I'm using a randomly generated list of 4000 sets. Not that I'm knocking lambaack or anything (my answer probably takes longer), I was just wondering what you used.
    • lambacck
      lambacck about 13 years
      @Bryce Siedschlaw: What is your code for creating the sets? I'm interested to try on a larger group of sets.
    • Bryce Siedschlaw
      Bryce Siedschlaw about 13 years
      @lambacck I added code to my answer to generate the random sets list
    • twneale
      twneale about 13 years
      I ran each answer with timeit--fastest was lamback 0.03 (11 function calls), followed by Thiago 0.07 (50 calls), and Bryce 0.1 (69 calls).
    • lambacck
      lambacck about 13 years
      @twneale: be sure that if you did multiple iterations with timeit that you use a deep copy of listOfSets ([set(x) for x in listOfSets]) otherwise you are just re-running the already combined sets.
    • Thiago Chaves
      Thiago Chaves about 13 years
      @twneale: And please remove the print statements since they take too much time. Thanks.
    • Thiago Chaves
      Thiago Chaves about 13 years
      I have tested lambacck's and my method now. lambacck's is 10x faster by my measurements. I guess all those indirections and control maps take too much lot of time.
    • twneale
      twneale about 13 years
      @lambacck, Thiago: I ran timeit like this "t = timeit.Timer('thiago()', 'from main import thiago'); print t.repeat(number=1000). For each, everything was inside the function. I also removed all the prints.
  • andySnooker
    andySnooker about 13 years
    That looks hopeful on the small example data lambacck. I'll give it a go in my plugin tomorrow and see how it fairs. Cheers.
  • lambacck
    lambacck about 13 years
    If your going for speed, you will want to remove the function calls, since they are a well known slowdown. Also searching for a non-connected set on each iteration of the while loop takes a long time, better to track that as you iterate and keep a flag.
  • andySnooker
    andySnooker about 13 years
    Works a treat. Thankyou. If I had 15 rep I'd Vote You Up.
  • Thiago Chaves
    Thiago Chaves about 13 years
    This algorithm runs only once and would give the wrong answer in many situations. For example, for the input [(1,2),(2,3),(3,4)] it will answer incorrectly [(1,2,3),(1,2,3,4),(2,3,4)]