Image Segmentation using Mean Shift explained
Solution 1
The basics first:
The Mean Shift segmentation is a local homogenization technique that is very useful for damping shading or tonality differences in localized objects. An example is better than many words:
Action:replaces each pixel with the mean of the pixels in a range-r neighborhood and whose value is within a distance d.
The Mean Shift takes usually 3 inputs:
- A distance function for measuring distances between pixels. Usually the Euclidean distance, but any other well defined distance function could be used. The Manhattan Distance is another useful choice sometimes.
- A radius. All pixels within this radius (measured according the above distance) will be accounted for the calculation.
- A value difference. From all pixels inside radius r, we will take only those whose values are within this difference for calculating the mean
Please note that the algorithm is not well defined at the borders, so different implementations will give you different results there.
I'll NOT discuss the gory mathematical details here, as they are impossible to show without proper mathematical notation, not available in StackOverflow, and also because they can be found from good sources elsewhere.
Let's look at the center of your matrix:
153 153 153 153
147 96 98 153
153 97 96 147
153 153 147 156
With reasonable choices for radius and distance, the four center pixels will get the value of 97 (their mean) and will be different form the adjacent pixels.
Let's calculate it in Mathematica. Instead of showing the actual numbers, we will display a color coding, so it's easier to understand what is happening:
The color coding for your matrix is:
Then we take a reasonable Mean Shift:
MeanShiftFilter[a, 3, 3]
And we get:
Where all center elements are equal (to 97, BTW).
You may iterate several times with Mean Shift, trying to get a more homogeneous coloring. After a few iterations, you arrive at a stable non-isotropic configuration:
At this time, it should be clear that you can't select how many "colors" you get after applying Mean Shift. So, let's show how to do it, because that is the second part of your question.
What you need to be able to set the number of output clusters in advance is something like Kmeans clustering.
It runs this way for your matrix:
b = ClusteringComponents[a, 3]
{{1, 1, 1, 1, 1, 1, 1, 1},
{1, 2, 2, 3, 2, 3, 3, 1},
{1, 3, 3, 3, 3, 3, 3, 1},
{1, 3, 2, 1, 1, 3, 3, 1},
{1, 3, 3, 1, 1, 2, 3, 1},
{1, 3, 3, 2, 3, 3, 3, 1},
{1, 3, 3, 2, 2, 3, 3, 1},
{1, 1, 1, 1, 1, 1, 1, 1}}
Or:
Which is very similar to our previous result, but as you can see, now we have only three output levels.
HTH!
Solution 2
A Mean-Shift segmentation works something like this:
The image data is converted into feature space
In your case, all you have are intensity values, so feature space will only be one-dimensional. (You might compute some texture features, for instance, and then your feature space would be two dimensional – and you’d be segmenting based on intensity and texture)
Search windows are distributed over the feature space
The number of windows, window size, and initial locations are arbitrary for this example – something that can be fine-tuned depending on specific applications
Mean-Shift iterations:
1.) The MEANs of the data samples within each window are computed
2.) The windows are SHIFTed to the locations equal to their previously computed means
Steps 1.) and 2.) are repeated until convergence, i.e. all windows have settled on final locations
The windows that end up on the same locations are merged
The data is clustered according to the window traversals
... e.g. all data that was traversed by windows that ended up at, say, location “2”, will form a cluster associated with that location.
So, this segmentation will (coincidentally) produce three groups. Viewing those groups in the original image format might look something like the last picture in belisarius' answer. Choosing different window sizes and initial locations might produce different results.
Related videos on Youtube
Sharpie
Updated on July 08, 2022Comments
-
Sharpie almost 2 years
Could anyone please help me understand how Mean Shift segmentation actually works?
Here is a 8x8 matrix that I just made up
103 103 103 103 103 103 106 104 103 147 147 153 147 156 153 104 107 153 153 153 153 153 153 107 103 153 147 96 98 153 153 104 107 156 153 97 96 147 153 107 103 153 153 147 156 153 153 101 103 156 153 147 147 153 153 104 103 103 107 104 103 106 103 107
Using the matrix above is it possible to explain how Mean Shift segmentation would separate the 3 different levels of numbers?
-
John over 13 yearsThree levels? I see numbers around 100 and around 150.
-
Sharpie over 13 yearsWell as its a segmenation I thought the numbers in the middle would be to far away from the edge numbers to be included in that section of the boundary. That is why I said 3. I could be wrong as I dont really understand how this type of segmenation works.
-
John over 13 yearsOh ... maybe we're taking levels to mean different things. All good. :)
-
Helin Wang over 9 yearsI like the accepted answer, but I don't think it has show the whole picture. IMO this pdf explains mean shift segmentation better (using a higher dimension space as example is better than 2d I think). eecs.umich.edu/vision/teaching/EECS442_2012/lectures/…
-
-
Kaushik Acharya over 12 yearsIs there a Matlab implementation of Mean shift for image?
-
Dr. belisarius over 12 years@KaushikAcharya See shawnlankton.com/2007/11/mean-shift-segmentation-in-matlab
-
Helin Wang over 9 yearscan't feature space be three dimensional, accounting the position of each pixel?
-
mlai over 9 years@HelinWang Yes, it could.
-
Lennert over 7 yearsI understand how the window is positioned in a 1D feature space, but I don't see how your explanation can be applied to mean shift algorithms that include a spatial search distance (e.g. in pixels) and spectral distance (value difference), for example as in Dr. belisarius' answer. Could you clarify this?
-
mlai over 7 years@Lennert If I understand your question correctly, then essentially what you would do is add the pixel locations (x,y) as features, or add other location-based features, to the clustering space. It seems like Belisarius's answer references a specific implementation in Mathematica which may do something a bit more sophisticated. Does that help?
-
Lennert over 7 yearsWhat confused me in your answer was how the n 'windows' shifted positions. But if I am correct, the meanshift filter just loops through all values and replaces each value by the mean of the 'window' surrounding the pixel (whereby the window is based on the spatial and spectral distance). So there are no defined 'windows' that shift, but the values of the means that converge after n repetitions.
-
mlai over 7 years@Lennert Yeah, I think you got it. I can understand why you found my answer confusing. I kinda drew my 'windows' as two-dimensional but really I was trying to depict clustering on only the one dimensional spectral values. It would pretty cool to visualize mean shift cluster in three-dimensional feature space with a drawing or animation (Maybe one of these days if I get the time)
-
Mohamed Akram over 2 yearsOur professor used your explanation and gave us the link. So thank you.