Graph Algorithm To Find All Connections Between Two Arbitrary Vertices

102,142

Solution 1

It appears that this can be accomplished with a depth-first search of the graph. The depth-first search will find all non-cyclical paths between two nodes. This algorithm should be very fast and scale to large graphs (The graph data structure is sparse so it only uses as much memory as it needs to).

I noticed that the graph you specified above has only one edge that is directional (B,E). Was this a typo or is it really a directed graph? This solution works regardless. Sorry I was unable to do it in C, I'm a bit weak in that area. I expect that you will be able to translate this Java code without too much trouble though.

Graph.java:

import java.util.HashMap;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.Map;
import java.util.Set;

public class Graph {
    private Map<String, LinkedHashSet<String>> map = new HashMap();

    public void addEdge(String node1, String node2) {
        LinkedHashSet<String> adjacent = map.get(node1);
        if(adjacent==null) {
            adjacent = new LinkedHashSet();
            map.put(node1, adjacent);
        }
        adjacent.add(node2);
    }

    public void addTwoWayVertex(String node1, String node2) {
        addEdge(node1, node2);
        addEdge(node2, node1);
    }

    public boolean isConnected(String node1, String node2) {
        Set adjacent = map.get(node1);
        if(adjacent==null) {
            return false;
        }
        return adjacent.contains(node2);
    }

    public LinkedList<String> adjacentNodes(String last) {
        LinkedHashSet<String> adjacent = map.get(last);
        if(adjacent==null) {
            return new LinkedList();
        }
        return new LinkedList<String>(adjacent);
    }
}

Search.java:

import java.util.LinkedList;

public class Search {

    private static final String START = "B";
    private static final String END = "E";

    public static void main(String[] args) {
        // this graph is directional
        Graph graph = new Graph();
        graph.addEdge("A", "B");
        graph.addEdge("A", "C");
        graph.addEdge("B", "A");
        graph.addEdge("B", "D");
        graph.addEdge("B", "E"); // this is the only one-way connection
        graph.addEdge("B", "F");
        graph.addEdge("C", "A");
        graph.addEdge("C", "E");
        graph.addEdge("C", "F");
        graph.addEdge("D", "B");
        graph.addEdge("E", "C");
        graph.addEdge("E", "F");
        graph.addEdge("F", "B");
        graph.addEdge("F", "C");
        graph.addEdge("F", "E");
        LinkedList<String> visited = new LinkedList();
        visited.add(START);
        new Search().depthFirst(graph, visited);
    }

    private void depthFirst(Graph graph, LinkedList<String> visited) {
        LinkedList<String> nodes = graph.adjacentNodes(visited.getLast());
        // examine adjacent nodes
        for (String node : nodes) {
            if (visited.contains(node)) {
                continue;
            }
            if (node.equals(END)) {
                visited.add(node);
                printPath(visited);
                visited.removeLast();
                break;
            }
        }
        for (String node : nodes) {
            if (visited.contains(node) || node.equals(END)) {
                continue;
            }
            visited.addLast(node);
            depthFirst(graph, visited);
            visited.removeLast();
        }
    }

    private void printPath(LinkedList<String> visited) {
        for (String node : visited) {
            System.out.print(node);
            System.out.print(" ");
        }
        System.out.println();
    }
}

Program Output:

B E 
B A C E 
B A C F E 
B F E 
B F C E 

Solution 2

The National Institute of Standards and Technology (NIST) online Dictionary of Algorithms and Data Structures lists this problem as "all simple paths" and recommends a depth-first search. CLRS supplies the relevant algorithms.

A clever technique using Petri Nets is found here

Solution 3

Here is the pseudocode I came up with. This is not any particular pseudocode dialect, but should be simple enough to follow.

Anyone want to pick this apart.

  • [p] is a list of vertices representing the current path.

  • [x] is a list of paths where meet the criteria

  • [s] is the source vertex

  • [d] is the destination vertex

  • [c] is the current vertex (argument to the PathFind routine)

Assume there is an efficient way to look up the adjacent vertices (line 6).

     1 PathList [p]
     2 ListOfPathLists [x]
     3 Vertex [s], [d]

     4 PathFind ( Vertex [c] )
     5     Add [c] to tail end of list [p]
     6     For each Vertex [v] adjacent to [c]
     7         If [v] is equal to [d] then
     8             Save list [p] in [x]
     9         Else If [v] is not in list [p]
    10             PathFind([v])
    11     Next For
    12     Remove tail from [p]
    13 Return

Solution 4

Since the existing non-recursive DFS implementation given in this answer seems to be broken, let me provide one that actually works.

I've written this in Python, because I find it pretty readable and uncluttered by implementation details (and because it has the handy yield keyword for implementing generators), but it should be fairly easy to port to other languages.

# a generator function to find all simple paths between two nodes in a
# graph, represented as a dictionary that maps nodes to their neighbors
def find_simple_paths(graph, start, end):
    visited = set()
    visited.add(start)

    nodestack = list()
    indexstack = list()
    current = start
    i = 0

    while True:
        # get a list of the neighbors of the current node
        neighbors = graph[current]

        # find the next unvisited neighbor of this node, if any
        while i < len(neighbors) and neighbors[i] in visited: i += 1

        if i >= len(neighbors):
            # we've reached the last neighbor of this node, backtrack
            visited.remove(current)
            if len(nodestack) < 1: break  # can't backtrack, stop!
            current = nodestack.pop()
            i = indexstack.pop()
        elif neighbors[i] == end:
            # yay, we found the target node! let the caller process the path
            yield nodestack + [current, end]
            i += 1
        else:
            # push current node and index onto stacks, switch to neighbor
            nodestack.append(current)
            indexstack.append(i+1)
            visited.add(neighbors[i])
            current = neighbors[i]
            i = 0

This code maintains two parallel stacks: one containing the earlier nodes in the current path, and one containing the current neighbor index for each node in the node stack (so that we can resume iterating through the neighbors of a node when we pop it back off the stack). I could've equally well used a single stack of (node, index) pairs, but I figured the two-stack method would be more readable, and perhaps easier to implement for users of other languages.

This code also uses a separate visited set, which always contains the current node and any nodes on the stack, to let me efficiently check whether a node is already part of the current path. If your language happens to have an "ordered set" data structure that provides both efficient stack-like push/pop operations and efficient membership queries, you can use that for the node stack and get rid of the separate visited set.

Alternatively, if you're using a custom mutable class / structure for your nodes, you can just store a boolean flag in each node to indicate whether it has been visited as part of the current search path. Of course, this method won't let you run two searches on the same graph in parallel, should you for some reason wish to do that.

Here's some test code demonstrating how the function given above works:

# test graph:
#     ,---B---.
#     A   |   D
#     `---C---'
graph = {
    "A": ("B", "C"),
    "B": ("A", "C", "D"),
    "C": ("A", "B", "D"),
    "D": ("B", "C"),
}

# find paths from A to D
for path in find_simple_paths(graph, "A", "D"): print " -> ".join(path)

Running this code on the given example graph produces the following output:

A -> B -> C -> D
A -> B -> D
A -> C -> B -> D
A -> C -> D

Note that, while this example graph is undirected (i.e. all its edges go both ways), the algorithm also works for arbitrary directed graphs. For example, removing the C -> B edge (by removing B from the neighbor list of C) yields the same output except for the third path (A -> C -> B -> D), which is no longer possible.


Ps. It's easy to construct graphs for which simple search algorithms like this one (and the others given in this thread) perform very poorly.

For example, consider the task of find all paths from A to B on an undirected graph where the starting node A has two neighbors: the target node B (which has no other neighbors than A) and a node C that is part of a clique of n+1 nodes, like this:

graph = {
    "A": ("B", "C"),
    "B": ("A"),
    "C": ("A", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
    "D": ("C", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
    "E": ("C", "D", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
    "F": ("C", "D", "E", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
    "G": ("C", "D", "E", "F", "H", "I", "J", "K", "L", "M", "N", "O"),
    "H": ("C", "D", "E", "F", "G", "I", "J", "K", "L", "M", "N", "O"),
    "I": ("C", "D", "E", "F", "G", "H", "J", "K", "L", "M", "N", "O"),
    "J": ("C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "O"),
    "K": ("C", "D", "E", "F", "G", "H", "I", "J", "L", "M", "N", "O"),
    "L": ("C", "D", "E", "F", "G", "H", "I", "J", "K", "M", "N", "O"),
    "M": ("C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "N", "O"),
    "N": ("C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "O"),
    "O": ("C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N"),
}

It's easy to see that the only path between A and B is the direct one, but a naïve DFS started from node A will waste O(n!) time uselessly exploring paths within the clique, even though it's obvious (to a human) that none of those paths can possibly lead to B.

One can also construct DAGs with similar properties, e.g. by having the starting node A connect target node B and to two other nodes C1 and C2, both of which connect to the nodes D1 and D2, both of which connect to E1 and E2, and so on. For n layers of nodes arranged like this, a naïve search for all paths from A to B will end up wasting O(2n) time examining all the possible dead ends before giving up.

Of course, adding an edge to the target node B from one of the nodes in the clique (other than C), or from the last layer of the DAG, would create an exponentially large number of possible paths from A to B, and a purely local search algorithm can't really tell in advance whether it will find such an edge or not. Thus, in a sense, the poor output sensitivity of such naïve searches is due to their lack of awareness of the global structure of the graph.

While there are various preprocessing methods (such as iteratively eliminating leaf nodes, searching for single-node vertex separators, etc.) that could be used to avoid some of these "exponential-time dead ends", I don't know of any general preprocessing trick that could eliminate them in all cases. A general solution would be to check at every step of the search whether the target node is still reachable (using a sub-search), and backtrack early if it isn't — but alas, that would significantly slow down the search (at worst, proportionally to the size of the graph) for many graphs that don't contain such pathological dead ends.

Solution 5

Here is a logically better-looking recursive version as compared to the second floor.

public class Search {

private static final String START = "B";
private static final String END = "E";

public static void main(String[] args) {
    // this graph is directional
    Graph graph = new Graph();
    graph.addEdge("A", "B");
    graph.addEdge("A", "C");
    graph.addEdge("B", "A");
    graph.addEdge("B", "D");
    graph.addEdge("B", "E"); // this is the only one-way connection
    graph.addEdge("B", "F");
    graph.addEdge("C", "A");
    graph.addEdge("C", "E");
    graph.addEdge("C", "F");
    graph.addEdge("D", "B");
    graph.addEdge("E", "C");
    graph.addEdge("E", "F");
    graph.addEdge("F", "B");
    graph.addEdge("F", "C");
    graph.addEdge("F", "E");
    List<ArrayList<String>> paths = new ArrayList<ArrayList<String>>();
    String currentNode = START;
    List<String> visited = new ArrayList<String>();
    visited.add(START);
    new Search().findAllPaths(graph, seen, paths, currentNode);
    for(ArrayList<String> path : paths){
        for (String node : path) {
            System.out.print(node);
            System.out.print(" ");
        }
        System.out.println();
    }   
}

private void findAllPaths(Graph graph, List<String> visited, List<ArrayList<String>> paths, String currentNode) {        
    if (currentNode.equals(END)) { 
        paths.add(new ArrayList(Arrays.asList(visited.toArray())));
        return;
    }
    else {
        LinkedList<String> nodes = graph.adjacentNodes(currentNode);    
        for (String node : nodes) {
            if (visited.contains(node)) {
                continue;
            } 
            List<String> temp = new ArrayList<String>();
            temp.addAll(visited);
            temp.add(node);          
            findAllPaths(graph, temp, paths, node);
        }
    }
}
}

Program Output

B A C E 

B A C F E 

B E

B F C E

B F E 
Share:
102,142
Robert Groves
Author by

Robert Groves

Just a programmer doing my best to not write any code that will someday become part of a rampant, self-aware AI which attempts to destroy the world. SOreadytohelp

Updated on March 23, 2020

Comments

  • Robert Groves
    Robert Groves about 4 years

    I am trying to determine the best time efficient algorithm to accomplish the task described below.

    I have a set of records. For this set of records I have connection data which indicates how pairs of records from this set connect to one another. This basically represents an undirected graph, with the records being the vertices and the connection data the edges.

    All of the records in the set have connection information (i.e. no orphan records are present; each record in the set connects to one or more other records in the set).

    I want to choose any two records from the set and be able to show all simple paths between the chosen records. By "simple paths" I mean the paths which do not have repeated records in the path (i.e. finite paths only).

    Note: The two chosen records will always be different (i.e. start and end vertex will never be the same; no cycles).

    For example:

        If I have the following records:
            A, B, C, D, E
    
        and the following represents the connections: 
            (A,B),(A,C),(B,A),(B,D),(B,E),(B,F),(C,A),(C,E),
            (C,F),(D,B),(E,C),(E,F),(F,B),(F,C),(F,E)
    
            [where (A,B) means record A connects to record B]
    

    If I chose B as my starting record and E as my ending record, I would want to find all simple paths through the record connections that would connect record B to record E.

       All paths connecting B to E:
          B->E
          B->F->E
          B->F->C->E
          B->A->C->E
          B->A->C->F->E
    

    This is an example, in practice I may have sets containing hundreds of thousands of records.

  • Casey Watson
    Casey Watson over 15 years
    mweerden, The breadth-first traversal that I submitted will find ALL paths while avoiding any cycles. For the graph that you have specified, the implementation correctly finds all three paths.
  • Sudoer
    Sudoer over 15 years
    I didn't completely read your code and assumed you used a breadth-first traversal (because you said so). However, on closer inspection after your comment, I noticed that it is in fact not. It is actually a memoryless depth-first traversal like those of Ryan, Christian and Robert.
  • Sudoer
    Sudoer over 15 years
    Please note that this is not a breadth-first traversal. With breadth first you first visit all nodes with distance 0 to the root, then those with distance 1, then 2, etc.
  • Matt J
    Matt J over 15 years
    Correct, this is a DFS. A BFS would need to use a queue, enqueuing level-(N+1) nodes to be processed after all level-N nodes. However, for the OP's purposes, either BFS or DFS will work, as no preferred sorting order of paths is specified.
  • afterxleep
    afterxleep over 15 years
    Casey, I've been looking for a solution to this problem for ages. I have recently implemented this DFS in C++ and it works a treat.
  • Pacerier
    Pacerier over 12 years
    Could you help me with a better solution? a DFS takes forever to run: stackoverflow.com/q/8342101/632951
  • Rrr
    Rrr over 11 years
    Disadvantage of recursion is if you will have deep graph (A->B->C->...->N) you could have StackOverflowError in java.
  • bozo user
    bozo user about 11 years
    Can you please shed some light on step 11 and step 12
  • Robert Groves
    Robert Groves about 11 years
    Line 11 just denotes the end block that goes with the For loop that begins on line 6. Line 12 means to remove the last element of the path list before returning to the caller.
  • bozo user
    bozo user about 11 years
    What is the initial call to PathFind - do you pass in the source vertex [s]?
  • Robert Groves
    Robert Groves about 11 years
    In this example yes, but keep in mind that you may not want to write real code that maps one-to-one with this pseudocode. It's meant more to illustrate a thought process rather than well designed code.
  • user1415567
    user1415567 over 9 years
    Ported the solution to c# and worked perfectly. Had to user an implementation of LinkedHashSet in C# but worked great.
  • Prabhakaran
    Prabhakaran over 9 years
    Thanks Casey Watson(poster of accepted answer) for the Solution to the problem. It worked like a charm. I started this new comment thread as i was not able to comment right below your post. For guys scratching head for C++ solution for the Casey Watson's Java solution: codepad.org/YpREeCKB
  • batta
    batta over 9 years
    I've added an iterative version in C# below.
  • PawelP
    PawelP over 9 years
    The comments are right. This is a DFS, not BFS! Please, please edit this answer - it can be very misleading, especially that the solution is correct, which gives you some authority...
  • Siddhartha Ghosh
    Siddhartha Ghosh over 8 years
    excellent -- about how you replaced the recursion with stack-based iteration.
  • Tito
    Tito over 8 years
    @MattJ so you're saying that the name of that method should be:private void depthFirst(Graph graph, LinkedList<String> visited) ?
  • Admin
    Admin over 8 years
    I still don't understand it, what is neighbours.Reverse()? Is it List<T>.Reverse ?
  • Ankit Roy
    Ankit Roy over 8 years
    This is a good algorithm, but saying that it will "scale to large graphs" is a bit misleading, simply because of the number of paths that might exist. A graph with just 15 vertices, and all 105 possible edges between them, already has (15-2)! = 6227020800 paths between any pair of vertices.
  • arslan
    arslan about 8 years
    I checked this non-recursive version, but it seems not correct. recursive version is fine. maybe when changed to non-recursive, a small mistake is happened
  • Ilmari Karonen
    Ilmari Karonen about 8 years
    @PawelP: I've edited the answer to fix the misleading method name. I was tempted to also merge the two loops within the method body (since keeping them separate does little except reduce performance and slightly alter the order of the outputs), and maybe make the methods static and use a HashSet instead of a LinkedList for performance, but in the end I decided to be cautious and not make any actual functional changes to the code. Still, it's worth noting that this is definitely not an optimal solution.
  • Ilmari Karonen
    Ilmari Karonen about 8 years
    Note that it's easy to come up with graphs for which DFS is very inefficient, even though the set of all simple paths between the two nodes is small and easy to find. For example, consider an undirected graph where the starting node A has two neighbors: the goal node B (which has no neighbors other than A), and a node C that is part of a fully connected clique of n + 1 nodes. Even though there's clearly just one simple path from A to B, a naïve DFS will waste O(n!) time uselessly exploring the clique. Similar examples (one solution, DFS takes exponential time) can be found among DAGs, too.
  • Ilmari Karonen
    Ilmari Karonen about 8 years
    This will not work in general: it's quite possible for two or more paths between the vertices to have the same last edge. Your method would only find one of such paths.
  • Ilmari Karonen
    Ilmari Karonen about 8 years
    This does not seem to answer the question as asked.
  • Ilmari Karonen
    Ilmari Karonen about 8 years
    This does not seem to solve the problem as asked. The OP wants to find all the simple paths between the two nodes, not merely to check whether a path exists.
  • Ilmari Karonen
    Ilmari Karonen about 8 years
    @alim: Agreed, this code is simply broken. (It does not correctly remove nodes from the visited set when backtracking, and the stack handling appears to be messed up, too. I tried to see if it could be fixed, but that would basically require a complete rewrite.) I've just added an answer with a correct, working non-recursive solution (in Python, but it should be relatively easy to port to other languages).
  • arslan
    arslan about 8 years
    @llmari Karonen, Nice, I am going to check, Great job.
  • arslan
    arslan about 8 years
    That is what I am looking for, Thank you :)
  • batta
    batta about 8 years
    @Love: Yes, neighbors is the adjacency list
  • batta
    batta about 8 years
    @llmari Karonen: There's no error. New neighbors get onto stack and on to visited list. Only when no neighbors have been found the vertex that's already added removed. I ran tests again, paths are correctly resolved.
  • chomp
    chomp about 7 years
    The NIST says: "The paths may be enumerated with a depth-first search."
  • coderAJ
    coderAJ almost 7 years
    This code will find all paths between 2 specific points only. If my query is to find paths between any 2 points and number of queries is too large such that I can't carry out dfs for each query. Then , can I modify this code or how can I implement it ?
  • David Oliván
    David Oliván about 6 years
    Thank you for your DFS non-recursive solution. Just note the last line printing the result has a syntax error, should be for path in find_simple_paths(graph, "A", "D"): print(" -> ".join(path)), the print was missing the parenthesis.
  • Ilmari Karonen
    Ilmari Karonen about 6 years
    @DavidOlivánUbieto: It's Python 2 code, that's why there's no parentheses. :)
  • Ebikeneser
    Ebikeneser over 3 years
    @batta could you possibly post your full soloution in C# please? There is no AdjacentNodes method here and I am struggling to hook it up.
  • Ebikeneser
    Ebikeneser over 3 years
    Is there a C# version of this?