Detecting if two images are visually identical
Solution 1
findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.
Solution 2
Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.
Solution 3
Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.
Solution 4
I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.
Bemmu
Updated on June 15, 2022Comments
-
Bemmu about 2 years
Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!"); try $image->Modulate(saturation=>-100); try $image->Blur(radius=>3,sigma=>99); try $image->Normalize(); try $image->Equalize(); try $image->Sample("16x16"); try $image->Threshold(); try $image->Set(magick=>'mono'); ($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.
-
Charlie Salts over 15 yearsThis won't work in cases where one image has been adjusted slightly, so that it is slightly darker or more saturated, or has been cropped a small amount. You also have to take into account that resampling is a costly affair, especially when using bicubic interpolation on large images.
-
kayjay almost 15 yearsCertainly the first step would be to reduce the original image to a minimal size. There is no need for a "tree on a hill" image to be 10gb to distinguish it from a "flower on a mound" image.
-
Dan Dascalescu over 11 yearsWhy the two downvotes? The user was asking for a command line tool, not a programming solution.
-
pts over 7 yearsExactly the same algorithm implemented in Python (with GraphicsMagick doing the heavy lifting) here: github.com/pts/pyfindimagedupes
-
Dan Dascalescu over 7 years@pts: the link wasn't broken for me; it redirected to the current link. I've updated the answer anyway.