Algorithm to compare two images in C#

79,938

Solution 1

Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)

  1. resize the picture to 16x16 pixel

16x16 resized

  1. reduce colors to black/white (which equals true/false in this console output)

enter image description here

  1. read the boolean values into List<bool> - this is the hash

Code:

public static List<bool> GetHash(Bitmap bmpSource)
{
    List<bool> lResult = new List<bool>();         
    //create new image with 16x16 pixel
    Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
    for (int j = 0; j < bmpMin.Height; j++)
    {
        for (int i = 0; i < bmpMin.Width; i++)
        {
            //reduce colors to true / false                
            lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
        }             
    }
    return lResult;
}

I know, GetPixel is not that fast but on a 16x16 pixel image it should not be the bottleneck.

  1. compare this hash to hash values from other images and add a tolerance.(number of pixels that can differ from the other hash)

Code:

List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));

//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);

So this code is able to find equal images with:

  • different file formats (e.g. jpg, png, bmp)
  • rotation (90, 180, 270), horizontal /vertical flip - by changing the iteration order of i and j
  • different dimensions (same aspect is required)
  • different compression (tolerance is required in case of quality loss like jpeg artifacts) - you can accept a 99% equality to be the same image and 50% to be a different one.
  • colored changed to geyscaled and the other way round (because brightness is independent of color)

Update / Improvements:

after using this method for a while I noticed a few improvements that can be done

  • replacing GetPixel for more performance
  • using the exeif-thumbnail instead of reading the whole image for a performance improvement
  • instead of setting 0.5f to differ between light and dark - use the distinct median brightness of all 256 pixels. Otherwise dark/light images are assumed to be the same and it enables to detect images which have a changed brightness.
  • if you need fast calculations, use bool[] or List<bool> if you need to store a lot hashes with the need to save memory, use a Bitarray because a Boolean isn't stored in a bit, it takes a byte!

Solution 2

You could check Algorithm to compare two images in order to see the available methods for image comparison.

Unless you want to recreate the full algorithms on your own, you should try to use already existing libraries or a least part of their code (as long as their license is ok for you).

For an open source C# implementation of Edge detection and related Computer vision algorithms, you can try EmguCV which is a wrapper of OpenCV.

Solution 3

After resampling the images to some common resolution, you could use a Wavelet Decomposition and compare the coefficients of this decomposition instead of the images themselves. Comparing only the first N coefficients will make this method more robust to noise and other artifacts.

There are several C# implementations for wavelets available. One example is https://waveletstudio.codeplex.com/

Solution 4

Interesting question, the comparision of images is not that hard given that,

  1. Those images are the same (first one is not a section of the second one or vise versa)
  2. The images are only rotated by multiples of 90 degrees

One way of doing comparison would be to,

  1. Resize both the images to the lowest size diamention
  2. Apply edge detection on each image resulting black and white image (or array of 0 and 1)
  3. Compare resulting bitmaps (keep first one still, and rotate the second one by 90 degrees 3 times) and calculate % matching pixcels and get the heighest value

Now if the value comes within a reasonable value say 90% (probably have to determine by doing few experiments), then you could safely assume both are the same, but this is not going to work if,

  1. Even if a few pixel differece in the corner, for example second image is cropped from first one
  2. Images are rotated other than multiples of 90 degrees (although this is not very likely)
Share:
79,938

Related videos on Youtube

Byyo
Author by

Byyo

Updated on July 05, 2022

Comments

  • Byyo
    Byyo almost 2 years

    I'm writing a tool in C# to find duplicate images. Currently I create an MD5 checksum of the files and compare those.

    Unfortunately, the images can be:

    • Rotated by 90 degrees.
    • Have different dimensions (smaller image with same content).
    • Have different compression or file types (e.g. jpeg artifacts, see below).

    higher resolution koalalower resolution koala

    What would be the best approach to solve this problem?

    • AntiHeadshot
      AntiHeadshot over 8 years
      Scaling both images to the same size using an edgedetection and then calculating a value representing a degree of difference (compared to all rotations) my help
    • Gabe
      Gabe over 8 years
    • mikus
      mikus over 8 years
      AntiHeadshot, indeed, but only if the pictures were modified using exacty the same algorithms with exactly the same settings, otherwise you might endup with huge differences. Also with quality-loss compression you might end up with different pictures only rotating it twice by 180 deg :) Not mentioning resizing. So the transitions would need to be perfectly repeated.
    • mikus
      mikus over 8 years
      Any way, MD5 can only check if they are exactly the same, any minor difference will give you false result, and it is impossible to decide how much different the pictures are based on md5, its truly 0/1 result. Still it's a duplicate
    • Андрей Беньковский
      Андрей Беньковский over 8 years
    • Mark Setchell
      Mark Setchell over 8 years
      Have a look at my answer here... stackoverflow.com/a/25204466/2836621
  • mikus
    mikus over 8 years
    keeping a bit of colors (ex. gray scale) should increase the precision, but harder to implement, good directlon anyway
  • Yurrit Avonds
    Yurrit Avonds over 8 years
    I just realised that resampling to some common resolution is not that simple if not all images are square, but you could resample the images to have the same height (keeping the aspect ratio). Different widths could then be a first indication that images are not the same. Afterwards you could apply the wavelets I mentioned above.
  • Andrew___Pls_Support_UA
    Andrew___Pls_Support_UA over 7 years
    Example of usage of this algorithm you can find here: github.com/ukushu/ImgComparator . By the way, thanks a lot :)
  • fubo
    fubo almost 6 years
    @Dror you can compare n images with this method just create a hash of all images
  • darego101
    darego101 over 4 years
    Would this method work with images of different sizes (since it resizes the image to 16x16)? i.e. my images are not all the same size and may not be square
  • fubo
    fubo over 4 years
    @darego101 yes, you can compare e.g. a 800x600 with a 640x480 image because shrinked into 16x16 they should look the same. Just the same aspect ratio of the compared images is required
  • darego101
    darego101 over 4 years
    @fubo thanks a lot. could you please expand on the 3rd point of your improvements: "instead of setting 0.5f to differ between light and dark - use the distinct median brightness of all 256 pixels". Some of my Bitmaps contain different colors so I will need to make use of RGB values for comparisons. What would be the easiest way to implement this?
  • Goodies
    Goodies about 4 years
    Hi Akash, while reviewing answers, I downvoted your answer, because it is no answer to the question. Problem is, the image can be scaled (different size) or even turned 90 degrees, or it may contain noise from .jpg quality conversion. An exact checksum will not do in this case.
  • Goodies
    Goodies about 4 years
    Interesting idea. When using wavelet transforms, a non-square image will not pose a problem, as long as aspect ratio is similar. One other advantage of using wavelet analysis: the second image could be a slightly cropped version (but not a small detail) of the first image. In that case, wavelet (of FFT) analysis should still yield a better distance measurement, while other methods fail.
  • Imran Ali Khan
    Imran Ali Khan over 3 years
    @fubo Thanks for your this method, and detailed explanation, it saved my 2 to 3 days. as expected with miner changes works for me