Uniform Cost Search in Python

34,637

Usually for searches, I tend to keep the path to a node part of the queue. This is not really memory efficient, but cheaper to implement.

If you want the parent map, remember that it is only safe to update the parent map when the child is on top of the queue. Only then has the algorithm determined the shortest path to the current node.

def ucs(G, v):
    visited = set()                  # set of visited nodes
    q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples 
                                     # (f, n, path), with
                                     # f: the cumulative cost,
                                     # n: the current node,
                                     # path: the path that led to the expansion of the current node
    q.put((0, v, [v]))               # add the starting node, this has zero *cumulative* cost 
                                     # and it's path contains only itself.

    while not q.empty():             # while the queue is nonempty
        f, current_node, path = q.get()
        visited.add(current_node)    # mark node visited on expansion,
                                     # only now we know we are on the cheapest path to
                                     # the current node.

        if current_node.is_goal:     # if the current node is a goal
            return path              # return its path
        else:
            for edge in in current_node.out_edges:
                child = edge.to()
                if child not in visited:
                    q.put((current_node_priority + edge.weight, child, path + [child]))

Note: I haven't really tested this, so feel free to comment, if it doesn't work right away.

Share:
34,637
Luke Collins
Author by

Luke Collins

I like maths and computers, can play the piano and also do a bit of teaching.

Updated on September 26, 2020

Comments

  • Luke Collins
    Luke Collins over 3 years

    I have implemented a simple graph data structure in Python with the following structure below. The code is here just to clarify what the functions/variables mean, but they are pretty self-explanatory so you can skip reading it.

    # Node data structure
    class Node: 
    
        def __init__(self, label):        
            self.out_edges = []
            self.label = label
            self.is_goal = False
    
    
        def add_edge(self, node, weight = 0):          
            self.out_edges.append(Edge(node, weight))
    
    
    # Edge data structure
    class Edge:
    
        def __init__(self, node, weight = 0):          
            self.node = node
            self.weight = weight
    
        def to(self):                                  
            return self.node
    
    
    # Graph data structure, utilises classes Node and Edge
    class Graph:    
    
        def __init__(self):                             
            self.nodes = []
    
        # some other functions here populate the graph, and randomly select three goal nodes.
    

    Now I am trying to implement a uniform-cost search (i.e. a BFS with a priority queue, guaranteeing a shortest path) which starts from a given node v, and returns a shortest path (in list form) to one of three goal node. By a goal node, I mean a node with the attribute is_goal set to true.

    This is my implementation:

    def ucs(G, v):
        visited = set()                  # set of visited nodes
        visited.add(v)                   # mark the starting vertex as visited
        q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples with cumulative cost
        q.put((0, v))                    # add the starting node, this has zero *cumulative* cost   
        goal_node = None                 # this will be set as the goal node if one is found
        parents = {v:None}               # this dictionary contains the parent of each node, necessary for path construction
    
        while not q.empty():             # while the queue is nonempty
            dequeued_item = q.get()        
            current_node = dequeued_item[1]             # get node at top of queue
            current_node_priority = dequeued_item[0]    # get the cumulative priority for later
    
            if current_node.is_goal:                    # if the current node is the goal
                path_to_goal = [current_node]           # the path to the goal ends with the current node (obviously)
                prev_node = current_node                # set the previous node to be the current node (this will changed with each iteration)
    
                while prev_node != v:                   # go back up the path using parents, and add to path
                    parent = parents[prev_node]
                    path_to_goal.append(parent)   
                    prev_node = parent
    
                path_to_goal.reverse()                  # reverse the path
                return path_to_goal                     # return it
    
            else:
                for edge in current_node.out_edges:     # otherwise, for each adjacent node
                    child = edge.to()                   # (avoid calling .to() in future)
    
                    if child not in visited:            # if it is not visited
                        visited.add(child)              # mark it as visited
                        parents[child] = current_node   # set the current node as the parent of child
                        q.put((current_node_priority + edge.weight, child)) # and enqueue it with *cumulative* priority
    

    Now, after lots of testing and comparing with other alogrithms, this implementation seemed to work pretty well - up until I tried it with this graph:

    Graph one

    For whatever reason, ucs(G,v) returned the path H -> I which costs 0.87, as opposed to the path H -> F -> I, costing 0.71 (this path was obtained by running a DFS). The following graph also gave an incorrect path:

    Graph two

    The algorithm gave G -> F instead of G -> E -> F, obtained again by the DFS. The only pattern I can observe among these rare cases is the fact that the chosen goal node always has a loop. I can't figure out what is going wrong though. Any tips will be much appreciated.

  • user2357112
    user2357112 about 7 years
    This is going to perform duplicate visits, potentially a lot of them, since you don't check if a node has already been visited before trying to enqueue its children and you don't do anything to deduplicate queue entries for the same node.
  • user2357112
    user2357112 about 7 years
    Deduplicating queue entries would probably require some sort of decrease-key functionality, which we don't know if these priority queues have, but it's simple to add a visited check to make sure you don't re-expand a visited node on a revisit.
  • Luke Collins
    Luke Collins about 7 years
    and what if we have loops?
  • dhke
    dhke about 7 years
    @user2357112 From experience: Leave deduplication to the implementation of the queue, don't clutter the search algorithm with it. And yes, you only need to keep the shortest path to every node on the queue. But the problem that you can only update the visited list when you expand a node remains.
  • Luke Collins
    Luke Collins about 7 years
    @dhke I think I'd rather keep the parent map. If I simply change where the visited.add(...) is happening, then leaving parents[child] = current_node where it is should always give the child node the right parent -- right?
  • dhke
    dhke about 7 years
    @LukeCollins Unless you have cycles with negative path lengths, loops do not cause non-termination. Keeping the visited list is already an optimization, since you will get repeated additions of the same node with different paths lengths and the visited list can catch some of them. If you push the optimization even more, you need a "unique" priority queue, where a node can only appear once (with its shortest possible path).
  • dhke
    dhke about 7 years
    @LukeCollins Yes, that should be sufficient. The parent map than basically says "for this node, the shortest possible path is incoming via parent". Note that this is then an implementation of Dijkstra's algorithm.
  • Luke Collins
    Luke Collins about 7 years
    @dhke Thanks for all the help!
  • Luke Collins
    Luke Collins about 7 years
    @dhke I have another problem :( What happened here? i.imgur.com/u635IlM.png I moved the visited() thing to when the queue is popped.