Implementing Disjoint Set System In Python

12,669

Solution 1

I don't have the latest edition of the book, but this doesn't look quite like a disjoint-set forest.

I think your mistake is to think that the forest has to be stored in a collection and that you have to traverse this collection to do the operations on the nodes. Remove set from Merge() and Find() and implement Find() as

def Find(n):
    if n != n.parent:
        n.parent = Find(n.parent)
    return n.parent

just like in the book. Then add a MakeSet() that returns a single correctly initialized node, and maybe a SameSet() function too:

def SameSet(n1, n2):
    return Find(n1) == Find(n2)

You now have a working disjoint set implementation.

Solution 2

Have you looked at any other existing implementations?

Solution 3

Presuming that each node is initialised to be its own parent:

def Find(node):
    while node is not node.parent:
        node = node.parent
    return node
Share:
12,669
Jordan
Author by

Jordan

Updated on August 22, 2022

Comments

  • Jordan
    Jordan over 1 year

    What I have so far is largely based off page 571 of "Introduction To Algorithms" by Cormen et al.

    I have a Node class in python that represents a set:

    class Node:
        def __init__(self, parent, rank = 0):
            self.parent = parent
            self.rank = rank

    This implementation is going to use a List of Nodes as the forest (I am open to better ways to store the sets).

    Initialize() returns a list of Nodes, that I will store in variable set and pass into the other functions.

    Find searches through the forest for the value and returns the set that it appears in. I chose to use for s in range(len(set)): so that in the recursion I could shrink the list of sets being passed in by set[s:].

    def Find(set, value):
        for s in range(len(set)):
            if value != set[s].parent:
                set[s].parent = Find(set[s:], set[s].parent)
            return set[s]

    Merge merges sets in the forest by finding them and promoting the higher ranked set.

    def Merge(set, value1, value2):
        set_val_1 = Find(set, value1)
        set_val_2 = Find(set, value2)
        if set_val_1.rank > set_val_2.rank:
            set_val_2.parent = set_val_1
        else:
            set_val_1.parent = set_val_2
            if set_val_1.rank == set_val_2.rank:
                set_val_2.rank += 1

    I am getting some errors when I do Finds and Merges, namely Find is not returning the proper set, and so I am not sure if Merge is working properly as well. I would appreciate some help to make sure the functions are implemented properly.

  • John Machin
    John Machin about 13 years
    In a real-world "working disjoint set implementation", there'd be a facility to iterate over all the sets, a facility to return all the members of the set that includes a given node, and the Node class would have some kind of "payload" attribute.
  • Jordan
    Jordan about 13 years
    @John Machin: How might I iterate through my entire collection of sets to find which set node is in?
  • antonakos
    antonakos about 13 years
    @John By "working" I just mean that it works. Brandon links to two implementations, including one from a Python cookbook. Neither of these store payloads nor do they support iteration over the sets or retrieval of all members of a set. But they are still useful for the typical use-case of keeping track of connected components.
  • John Machin
    John Machin about 13 years
    nodes as defined by the OP have only a parent and a rank; what components does this keep track of??? The "one from a Python cookbook" implements the payload in the sample code (see "label"), and the mention of itertools.groupby suggests that it might be doing something useful. In Eppstein's code, the "hashable objects" are the "payload".
  • John Machin
    John Machin about 13 years
    @Yoel: If you mean the set leader: leader = Find(node). If you mean a list of fellow member nodes: leader = Find(node); return [x for x in list_of_nodes if Find(x) is leader]
  • antonakos
    antonakos about 13 years
    @John My reading of Brandon's examples was a little sloppy, sorry. The nodes can carry a payload as in the Recipe demo, and Epstein's implementation (which wasn't from a real book, my mistake!) does allow iteration through all keys. But the disjoint set is about connectedness, not values. Store the set within the value, not the other way around. Then you can ask if two values are connected or join two values together. That's what is being kept track of.
  • antonakos
    antonakos about 13 years
    Beware that this does not do path compression.
  • Jordan
    Jordan about 13 years
    @John Machin: +1 for being the most helpful so far. What I meant by the collection of sets is, the list that contains all of the sets in my problem i.e. what Initialize returns.
  • John Machin
    John Machin about 13 years
    @Yoel: Thanks for the +1. I know what you mean by the collection of sets; I called it "list_of_nodes". My question was: what ANSWER do you require to your "which set node is in" query? The target set's representative/leader node, or a container-load of the nodes in the target set?
  • Jordan
    Jordan about 13 years
    @John Machin: I need the representative/leader node.
  • John Machin
    John Machin about 13 years
    @Yoel: We seem to be going around in a loop. The whole point of my first comment was that you DON'T need to iterate through the collection to get the representative node of the set that z belongs to; it is simply Find(z); that is the whole purpose of the Find() function.
  • Jordan
    Jordan about 13 years
    I looked over my implementation and played around with a few things, the basic issue I was having was that I was comparing results of Find operations which return Node objects. Really I needed to return Node.parent which are ints (in my case) and actually make sense to compare (as opposed to the references of Node objects. Bottom line you get the accept.
  • Jordan
    Jordan about 13 years
    @John Machin: Just wanted to say thank you for your help, ultimately I needed to tweek some things that @antonakos suggested but you were also really helpful.