Detecting if two images are visually identical

11,964

Solution 1

findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.

Solution 2

Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.

The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.

MATLAB example: Registering an Image Using Normalized Cross-Correlation

Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:

The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.

Solution 3

Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.

Solution 4

I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.

Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.

Share:
11,964
Bemmu
Author by

Bemmu

Updated on June 15, 2022

Comments

  • Bemmu
    Bemmu about 2 years

    Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.

    I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?

    edit: relevant code from findimagedupes, using ImageMagick

    try $image->Sample("160x160!");
    try $image->Modulate(saturation=>-100);
    try $image->Blur(radius=>3,sigma=>99);
    try $image->Normalize();
    try $image->Equalize();
    try $image->Sample("16x16");
    try $image->Threshold();
    try $image->Set(magick=>'mono');
    ($blob) = $image->ImageToBlob();
    

    edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.

  • Charlie Salts
    Charlie Salts over 15 years
    This won't work in cases where one image has been adjusted slightly, so that it is slightly darker or more saturated, or has been cropped a small amount. You also have to take into account that resampling is a costly affair, especially when using bicubic interpolation on large images.
  • kayjay
    kayjay almost 15 years
    Certainly the first step would be to reduce the original image to a minimal size. There is no need for a "tree on a hill" image to be 10gb to distinguish it from a "flower on a mound" image.
  • Dan Dascalescu
    Dan Dascalescu over 11 years
    Why the two downvotes? The user was asking for a command line tool, not a programming solution.
  • pts
    pts over 7 years
    Exactly the same algorithm implemented in Python (with GraphicsMagick doing the heavy lifting) here: github.com/pts/pyfindimagedupes
  • Dan Dascalescu
    Dan Dascalescu over 7 years
    @pts: the link wasn't broken for me; it redirected to the current link. I've updated the answer anyway.