Detecting if an object from one image is in another image with OpenCV

c# python image-processing opencv computer-vision

12,720

Solution 1

Check out the SURF features, which are a part of openCV. The idea here is that you have an algorithm for finding "interest points" in two images. You also have an algorithm for computing a descriptor of an image patch around each interest point. Typically this descriptor captures the distribution of edge orientations in the patch. Then you try to find point correspondences, i. e. for each interest point in image A try to find a corresponding interest point in image B. This is accomplished by comparing the descriptors, and looking for the closest matches. Then, if you have a set of correspondences that are related by some geometric transformation, you have a detection.

Of course, this is a very high level explanation. The devil is in the details, and for those you should read some papers. Start with Distinctive image features from scale-invariant keypoints by David Lowe, and then read the papers on SURF.

Also, consider moving this question to Signal and Image Processing Stack Exchange

Solution 2

Fortunately, the kind guys from OpenCV just did that for you. Check in your samples folder "opencv\samples\cpp\matching_to_many_images.cpp". Compile and give it a try wih the default images.

The algorithm can be easily adapted to make it faster or more precise.

Mainly, object recognition algorithms are split in two parts: keypoint detection& description adn object matching. For both of them there are many algorithms/variants, with wich you can play directly into OpenCV.

Detection/description can be done by: SIFT/SURF/ORB/GFTT/STAR/FAST and others.

For matching you have: brute force, hamming, etc. (Some methods are specific for a given detection algorithm)

HINTS to start:

crop your original image so the interesting object covers as much as possible of the image area. Use it as training.
SIFT is the most accurate and the laziest descriptor. FAST is a good combination of precision and accuracy. GFTT is old and quite unreliable. ORB is newly added to OPENCV and is very promising, both in speed and accuracy.
The results depend on the pose of the object in the other image. If it is resized, rotated, squeezed, partly covered, etc, try SIFT. if it is a simple task (i.e. it appears at the almost same size/rotation/etc, most of the descriptors will cope well)
ORB may not be yet in the OpenCV release. Try to download the latest from openCV trunk and compile it https://code.ros.org/svn/opencv/trunk

So, you can find the best combination for you by trial and error.

For the details of every implementation, you should read the original papers/tutorials. google scholar is a good start

Solution 3

In case someone comes along in the future, here's a small sample doing this with openCV. It's based on the opencv sample, but (in my opinion), this is a bit clearer, so I'm including it as well.

Tested with openCV 2.4.4

#!/usr/bin/env python

'''
Uses SURF to match two images.
  Finds common features between two images and draws them

Based on the sample code from opencv:
  samples/python2/find_obj.py

USAGE
  find_obj.py <image1> <image2>
'''

import sys

import numpy
import cv2


###############################################################################
# Image Matching
###############################################################################

def match_images(img1, img2, img1_features=None, img2_features=None):
    """Given two images, returns the matches"""
    detector = cv2.SURF(3200)
    matcher = cv2.BFMatcher(cv2.NORM_L2)

    if img1_features is None:
        kp1, desc1 = detector.detectAndCompute(img1, None)
    else:
        kp1, desc1 = img1_features

    if img2_features is None:
        kp2, desc2 = detector.detectAndCompute(img2, None)
    else:
        kp2, desc2 = img2_features

    #print 'img1 - %d features, img2 - %d features' % (len(kp1), len(kp2))

    raw_matches = matcher.knnMatch(desc1, trainDescriptors=desc2, k=2)
    kp_pairs = filter_matches(kp1, kp2, raw_matches)
    return kp_pairs


def filter_matches(kp1, kp2, matches, ratio=0.75):
    """Filters features that are common to both images"""
    mkp1, mkp2 = [], []
    for m in matches:
        if len(m) == 2 and m[0].distance < m[1].distance * ratio:
            m = m[0]
            mkp1.append(kp1[m.queryIdx])
            mkp2.append(kp2[m.trainIdx])
    kp_pairs = zip(mkp1, mkp2)
    return kp_pairs


###############################################################################
# Match Diplaying
###############################################################################

def draw_matches(window_name, kp_pairs, img1, img2):
    """Draws the matches"""
    mkp1, mkp2 = zip(*kp_pairs)

    H = None
    status = None

    if len(kp_pairs) >= 4:
        p1 = numpy.float32([kp.pt for kp in mkp1])
        p2 = numpy.float32([kp.pt for kp in mkp2])
        H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)

    if len(kp_pairs):
        explore_match(window_name, img1, img2, kp_pairs, status, H)


def explore_match(win, img1, img2, kp_pairs, status=None, H=None):
    """Draws lines between the matched features"""
    h1, w1 = img1.shape[:2]
    h2, w2 = img2.shape[:2]
    vis = numpy.zeros((max(h1, h2), w1 + w2), numpy.uint8)
    vis[:h1, :w1] = img1
    vis[:h2, w1:w1 + w2] = img2
    vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)

    if H is not None:
        corners = numpy.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]])
        reshaped = cv2.perspectiveTransform(corners.reshape(1, -1, 2), H)
        reshaped = reshaped.reshape(-1, 2)
        corners = numpy.int32(reshaped + (w1, 0))
        cv2.polylines(vis, [corners], True, (255, 255, 255))

    if status is None:
        status = numpy.ones(len(kp_pairs), numpy.bool_)
    p1 = numpy.int32([kpp[0].pt for kpp in kp_pairs])
    p2 = numpy.int32([kpp[1].pt for kpp in kp_pairs]) + (w1, 0)

    green = (0, 255, 0)
    red = (0, 0, 255)
    for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
        if inlier:
            col = green
            cv2.circle(vis, (x1, y1), 2, col, -1)
            cv2.circle(vis, (x2, y2), 2, col, -1)
        else:
            col = red
            r = 2
            thickness = 3
            cv2.line(vis, (x1 - r, y1 - r), (x1 + r, y1 + r), col, thickness)
            cv2.line(vis, (x1 - r, y1 + r), (x1 + r, y1 - r), col, thickness)
            cv2.line(vis, (x2 - r, y2 - r), (x2 + r, y2 + r), col, thickness)
            cv2.line(vis, (x2 - r, y2 + r), (x2 + r, y2 - r), col, thickness)
    vis0 = vis.copy()
    for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
        if inlier:
            cv2.line(vis, (x1, y1), (x2, y2), green)

    cv2.imshow(win, vis)

###############################################################################
# Test Main
###############################################################################

if __name__ == '__main__':
    if len(sys.argv) < 3:
        print "No filenames specified"
        print "USAGE: find_obj.py <image1> <image2>"
        sys.exit(1)

    fn1 = sys.argv[1]
    fn2 = sys.argv[2]

    img1 = cv2.imread(fn1, 0)
    img2 = cv2.imread(fn2, 0)

    if img1 is None:
        print 'Failed to load fn1:', fn1
        sys.exit(1)

    if img2 is None:
        print 'Failed to load fn2:', fn2
        sys.exit(1)

    kp_pairs = match_images(img1, img2)

    if kp_pairs:
        draw_matches('find_obj', kp_pairs, img1, img2)
    else:
        print "No matches found"

    cv2.waitKey()
    cv2.destroyAllWindows()

12,720

Author by

Wesley Tansey

Co-founder at Curvio and Machine Learning PhD Student in the Neural Networks Research Group at UT Austin

Updated on June 17, 2022

Comments

Wesley Tansey almost 2 years

I have a sample image which contains an object, such as the earrings in the following image:

http://imgur.com/luj2Z

I then have a large candidate set of images for which I need to determine which one most likely contains the object, e.g.:

http://imgur.com/yBWgc

So I need to produce a score for each image, where the highest score corresponds to the image which most likely contains the target object. Now, in this case, I have the following conditions/constraints to work with/around:

1) I can obtain multiple sample images at different angles.

2) The sample images are likely to be at different resolutions, angles, and distances than the candidate images.

3) There are a LOT of candidate images (> 10,000), so it must be reasonably fast.

4) I'm willing to sacrifice some precision for speed, so if it means we have to search through the top 100 instead of just the top 10, that's fine and can be done manually.

5) I can manipulate the sample images manually, such as outlining the object that I wish to detect; the candidate images cannot be manipulated manually as there are too many.

6) I have no real background in OpenCV or computer vision at all, so I'm starting from scratch here.

My initial thought is to start by drawing a rough outline around the object in the sample image. Then, I could identify corners in the object and corners in the candidate image. I could profile the pixels around each corner to see if they look similar and then rank by the sum of the maximum similarity scores of every corner. I'm also not sure how to quantify similar pixels. I guess just the Euclidean distance of their RGB values?

The problem there is that it kind of ignores the center of the object. In the above examples, if the corners of the earrings are all near the gold frame, then it would not consider the red, green, and blue stones inside the earring. I suppose I could improve this by then looking at all pairs of corners and determining similarity by sampling some points along the line between them.

So I have a few questions:

A) Does this line of thinking make sense in general or is there something I'm missing?

B) Which specific algorithms from OpenCV should I investigate using? I'm aware that there are multiple corner detection algorithms, but I only need one and if the differences are all optimizing on the margins then I'm fine with the fastest.

C) Any example code using the algorithms that would be helpful to aid in my understanding?

My options for languages are either Python or C#.