What is an efficient algorithm for counting the number of triangles in a graph?

22,805

Solution 1

You will need depth first search. The algorithm will be:

1) For the current node ask all unvisited adjacent nodes

2) for each of those nodes run depth two check to see if a node at depth 2 is your current node from step one

3) mark current node as visited

4) on make each unvisited adjacent node your current node (1 by 1) and run the same algorithm

Solution 2

Triangle counting is indeed difficult, and computationally costly. Maybe this is a good starting point for understanding why: Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning.

The appropriate loop should check for each of n nodes against each of their neighbours (n*(n-1)) and continue the cycle to see if n's neighbour's neighbour's neighbours is n: (n*(n-1))(n-1)(n-1), which is almost 10^16 for 10000 n. With a million nodes these loops get silly, but for your 10000 you should have no problem at all if you want to brute-force it :)

You mentioned that you code in C#, and graph (which is available for C) has an excellent algorithm for counting triangles written by Gabor Csardi. It counted 1.3 million triangles in my random graph of 10000 nodes and one million edges in 1.3 seconds on a five year old laptop :) Gabor Csardi would be the guy to ask :)

In terms of different programatical approaches, maybe you should look at the data in which you store your network. If stored in an adjacency matrix, the number of loops is fixed, but in an edge-list of a network of three edges, the number of loops is a multiple of three independent of the number of nodes. You could ask the edge-list for neighbours of a node without having to test every combination of i->j.

I wrote a pedagogical script in R to illustrate the approaches and measure the different algorithms' speed a very basic way. There are lots of speed-issues inherent to the use of R here (the edge-list version is completely swamped by too many edges), but I think the code example should get some ideas flowing on how to think about the speed of brute-forcing triangle-counts. This is in R, and not extremely neat, but well commented. I hope you can break the language barrier.

All the best.

# Counting triangles in a random graph using igraph and two different
# and almost equally stupid approaches looping through the 1) adjacency
# matrix and 2) the edge-list in R.

# Use igraph and these configs
library(igraph)
V <- 100
E <-  1700

# This is the random graph that we will use
g <- erdos.renyi.game(type="gnm", n=V, p=E, directed=FALSE, loops=FALSE)

# igraph has such a fast algorythm. Long live Gabor Csardi!
benchmark <- proc.time()["elapsed"]
       triangle.count <- sum(count_triangles(g)/3)
gabor.Csardi.benchmark <- proc.time()["elapsed"] - benchmark

# For not to big networks we can plot them to get a basic feel
if(length(V(g)) < 100){
       V(g)$size <- 5
       plot(g)
}

# The adjacency matrix approach will have to deal with a crazy
# ammount of looping through pairs of matrix combinations to
# find links:

# We'll loop through each node to check it's participation in triangles
# notice that a triangle ijk will be participated in by nodes
# i, j, and k, and that each of those nodes have two triangular counts.
# All in all the structures ijk, ikj, jik, jki, kij, kji are each counted
# but shall be returned as 1 triangle. We will therefore devide our
# search-result by 6 later on.

# Store our progess in this matrix to look at how we did later on
progress <- matrix(0, nrow=length(V(g)), ncol=8)

# Loop through all nodes to find triangles in an adjacency matrix
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in 1:length(V(g))){
       # Node i has connections to these nodes:
       i.neighbors <- as.vector( neighborhood(g, 1, nodes=i)[[1]] )
       i.neighbors <- setdiff(i.neighbors, c(i)) # i should not be part of its own neighborhood

       # for each i, tri is the number of triangles that i is involved in
       # using any j or any k. For a triangle {1,2,3}, tri will be 2 for
       # i==1, since i is part of both triangles {1,2,3} and {1,3,2}:
       tri <- 0

       for(j in i.neighbors)
       {
              # Node j has connections to these nodes:
              j.neighbors <- as.vector( neighborhood(g, 1, nodes=j)[[1]] )
              j.neighbors <- setdiff(j.neighbors, c(j)) # j should not be part of its own neighborhood

              # Were any of j's neighbors also a neighbor of i?
              k <- intersect(i.neighbors, j.neighbors)

              tri <- tri + length(k)
       }

       # Save our findings to the progress matrix
       progress[i,1] <- tri
       progress[i,7] <- proc.time()["elapsed"] - benchmark
}
progress[,2] <- sapply(1:length(progress[,1]), function(x) sum(progress[,1][1:x]))
progress[,3] <- round(progress[,2] / 6, digits=2)

# The edge-list approach uses a list of all edges in the network to loop through instead
# Here, I suppose, a lot of the extra speed could arise from R being better at looping
# with lapply() and at finding data in a data.frame that the brute-force loop above is.
el <- as.data.frame(as.matrix(get.edgelist(g, )))

# This is ugly. Make the edgelist contain all edges as both i->j and j->i. In
# the igraph object, they are only given as low i to high j by get.edgelist()
  el.rev <- data.frame(el[,2], el[,1])
  names(el) <- names(el.rev) <- c("i","j")
  el <- rbind(el, el.rev)

# these nodes are connected (we'd only need to bother abouth non isolates)
nodes <- sort(unique(c(el$i, el$j)))
tri <- 0

# Loop through connected nodes to find triangles in edge-list
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in nodes){
       i.neighbors <- el[el$i==i,]$j
       # i's neighbors are the $j:s of the edgelist where $i:s are i. 

       k.list <- unlist(lapply(i.neighbors, function(x) intersect(i.neighbors,el[el$i==x, ]$j)))
       # lists nodes that can be a k in an ijk-triangle for each of i's neighboring j:s
       # If 1 has neighbors 2 and 3, el[el$i==x, ]$j) will be first, the neighbors of 2 and then
       # the neighbors of 3. When intersected with the neighbors of i, k:s will be found. If
       # {1,2,3} is a triangle one k will be 3 for {i=1, j=2}, and another k will be 2 for {i=1, j=3}

       # k.list might be NULL
       tri.for.this.i <- (as.numeric(length(k.list)) / 2)
       # Here we devide by two since i can be in a triangle with j and k lik {ijk} and {ikj}
       # We will later have to devide by 3 more, since each triangle will be counted for
       # each node i that we loop through

       # Save the counting to the progress
       tri <- tri.for.this.i + tri
       progress[i,4] <- as.numeric(tri.for.this.i)
       mm <- c(mm, i)
       progress[i,8] <- proc.time()["elapsed"] - benchmark
}
progress[,5] <- sapply(1:length(progress[,4]), function(x) sum(progress[,4][1:x]))
progress[,6] <- round(progress[,5] / 3, digits=2)

# Fix the results into a usable format
results <- data.frame(c("igraph", "adjacency-loop", "edge-loop"),
                      c(triangle.count, max(progress[,3]), max(progress[,6])),
                      c(gabor.Csardi.benchmark, (max(progress[,7]) - min(progress[,7])), (max(progress[,8]) - min(progress[,8]))))
row.names(results) <- c("igraph", "Adjacensy-loop", "Edge-loop")
names(results) <- c("Routine", "Triangle count", "Execution-time")

# Now we have run two approaches of more or less the same thing.
# Not only should the igraph triangle.count, and the two loops
# be identical, but the progress of the two methods should too.
progress[,3] == progress[,6]
plot(progress[,6], type="l", col="blue")
lines(progress[,7], col="green")

# Look at the result:
View(results)

Solution 3

Depends on how your graphs are represented.

If you have an adjacency matrix A the number of triangles should be tr(A^3)/6, in other words, 1/6 times the sum of the diagonal elements (the division takes care of the orientation and rotation).

IF you have adjacency lists just start at every node and perform a depth-3 search. Count how often you reach that node -> divide by 6 again.

Share:
22,805
XBigTK13X
Author by

XBigTK13X

Code. Refactor. Repeat.

Updated on May 19, 2020

Comments

  • XBigTK13X
    XBigTK13X almost 4 years

    What is an efficient algorithm for counting the number of triangles in an undirected graph )(where a graph is a set of vertices and edges)? I've been searching Google and reading through my shelf of textbooks for a few hours each day for three days in a row.

    This is for a homework assignment where I need such an algorithm, but developing it doesn't count for anything on the assignment. It is expected that we can simply find such an algorithm from outside resources, but I'm at the end of my rope.

    For clarification, a triangle in a graph is a a cycle of length three. The trick is that it needs to work on vertex sets with at most 10,000 nodes.

    I'm currently working in C#, but care more about the general approach towards solving this problem than code to copy and paste.

    At the high level, my attempts thus far included:

    • A breadth first search that tracked all unique cycles of length three. This seemed like a fine idea to me, but I couldn't get it functional
    • A loop over all the nodes in the graph to see if three vertices shared an edge. This has far too slow of a running time for the larger data sets. O(n^3).

    The algorithm itself is part of calculating the clustering coefficient.

  • han
    han over 12 years
    In an undirected graph you only need a depth-two search, and then check if any depth-two nodes match depth-one nodes.
  • Harshdeep
    Harshdeep almost 10 years
    I had the exact same algorithm in mind but when you say "for current node ask all unvisited adjacent nodes" what exactly do you mean because a node is made up of both x and y coordinate. Should we decide on basis of both x and y coordinates of current node when considering other nodes as adjacent nodes or not? For example (1,2), (2,3), (3,1), (3,2), (1,3) what nodes are adjacent of what node?
  • Jarryd Goodman
    Jarryd Goodman almost 8 years
    So I'm doing an adjacency list implementation and you answer helped me, but could you explain why you divide by 6? I understand it's to eliminate redundancy but could you explain to me how you arrive at 6? It seems to me that each triangle would be counted 3 times (once from every possible source vertex), so why not divide by 3?
  • Admin
    Admin almost 8 years
    @JarrydGoodman Each diagonal value of A^3 is the number of paths of length 3 from a node to itself. Therefore you count each triangle twice at one node (because you have two directions), and six times at all three nodes.
  • Aryaman Gupta
    Aryaman Gupta almost 7 years
    Same thing I think but in a different way apply dfs (use stack) from any node and keep the track of the parent ie( from where the node has came ) also mark the node as traversed and remove the edges from that node to all connected neighbor then if during dfs a traversed node is found then check the parent of that node (i.e traversed node ) and the node popped from stack if their parents are same that means there is a triangle.