Given a 2D array of numbers, find clusters

18,644

Solution 1

Your problem seems to be finding connected components. You should use a traverse method (either BFS or DFS will do the work). Iterate over all elements, for each non-zero element start a traverse and record all elements you see in that traverse and turn each visited element into zero. Something like the code below:

void DFS(int x, int y)
{
  printf("%d ", g[x][y]);
  g[x][y] = 0;
  // iterate over neighbours
  for(dx=-1; dx<=1; dx++)
    for(dy=-1; dy<=1; dy++)
      if (g[x+dx][y+dy]) DFS(x+dx, y+dy);
}

for(i=0; i<n; i++)
  for(j=0; j<n; j++)
    if (g[i][j])
    {
      DFS(i, j);
      printf("\n");
    }

Solution 2

You want to do Connected Component Labeling. This is usually considered an image processing algorithm, but it matches what you describe.

You will also get recommendations of K-means because you said 2D array of numbers and it is easy to interpret that as array of 2D numbers. K-means finds clusters of points in a plane, not connected groups in a 2D array like you request.

Solution 3

Code sample:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Practice
{
    class Program
    {
        static void Main()
        {
            var matrix = new[] { new StringBuilder("00000"), new StringBuilder("02301"), new StringBuilder("08507"), new StringBuilder("70004") };
            var clusters = 0;
            for (var i = 0; i < matrix.Length; i++)
                for (var j = 0; j < (matrix.Any() ? matrix[i].Length : 0); j++)
                    if (matrix[i][j] != '0')
                    {
                        Console.Write("Cluster {0}: <", ++clusters);
                        var cluster = new List<char>();
                        Traverse(matrix, i, j, ref cluster);
                        Console.WriteLine("{0}>", string.Join(",", cluster));
                    }
            Console.ReadKey();
        }

        private static void Traverse(StringBuilder[] matrix, int row, int col, ref List<char> cluster)
        {
            cluster.Add(matrix[row][col]);
            matrix[row][col] = '0';
            for (var i = -1; i <= 1 && row + i >= 0 && row + i < matrix.Length; i++)
                for (var j = -1; j <= 1 && col + j >= 0 && col + j < (matrix.Any() ? matrix[row + i].Length : 0); j++)
                    if(matrix[row + i][col + j] != '0')
                        Traverse(matrix, row + i, col + j, ref cluster);
        }
    }
}

Algorithm explanation:

  • For each row:
    • For each column in that row:
      • If the item is an unvisited non-zero:
        1. Save the found cluster member;
        2. Mark location at [row, column] as visited;
        3. Check for any unvisited non-zero neighbors while staying in-bounds of the matrix:
          • [row - 1, column - 1];
          • [row - 1, column];
          • [row - 1, column + 1];
          • [row, column - 1];
          • [row, column + 1];
          • [row + 1, column - 1];
          • [row + 1, column];
          • [row + 1, column + 1].
        4. If any neighbor is an unvisited non-zero repeat steps 1-4 recursively until all neighbors are visited zeros (all cluster members have been found).

Solution 4

One way to do it is with a graph. Traverse the matrix in some order (I'd go left to right, top to bottom). When you encounter a non-zero element, add it to the graph. Then check all of its neighbors (it looks like you want 8-connected neighbors), and for each one that is non-zero, add its node to the graph, and connect it to the current element. The elements in the graph will probably have to keep track of their coordinates so you can see if you're adding a duplicate or not. When you're done traversing the matrix, you have a graph which contains a set of clusters. Clusters should be sub-graphs of connected elements.

Share:
18,644

Related videos on Youtube

dimple
Author by

dimple

Updated on June 04, 2022

Comments

  • dimple
    dimple over 1 year

    Given a 2D array, for example:

    0 0 0 0 0
    0 2 3 0 1
    0 8 5 0 7
    7 0 0 0 4
    

    Output should be groups of clusters:

    Cluster 1: <2,3,8,5,7>

    Cluster 2: <1,7,4>

  • Heretic Monkey
    Heretic Monkey over 6 years
    This is an algorithm question, not a c# or .net question. Also, dumps of nothing but code are rarely good answers. See this Meta question and its answers for more information.