Object Tracking in EmguCV

c# opencv emgucv template-matching

18,155

The video suggest template matching which due to speed I expect it's more likely to be a FFT (Fast Fourier Transform) Method, this is fairly easy to implement in EMGU however getting it perfect is hard.

Template Matching

First the template matching method I have made a method that will match an object within an image you feed into it FFT only works on single spectrum images for colour you will have to split the spectrum's and add the results matrices together:

Point Location;

private bool Detect_object(Image<Gray, Byte> Area_Image, Image<Gray, Byte> image_object)
{
    bool success = false;

    //Work out padding array size
    Point dftSize = new Point(Area_Image.Width + (image_object.Width * 2), Area_Image.Height + (image_object.Height * 2));
    //Pad the Array with zeros
    using (Image<Gray, Byte> pad_array = new Image<Gray, Byte>(dftSize.X, dftSize.Y))
    {
        //copy centre
        pad_array.ROI = new Rectangle(image_object.Width, image_object.Height, Area_Image.Width, Area_Image.Height);
        CvInvoke.cvCopy(Area_Image, pad_array, IntPtr.Zero);

        pad_array.ROI = (new Rectangle(0, 0, dftSize.X, dftSize.Y));

        //Match Template
        using (Image<Gray, float> result_Matrix = pad_array.MatchTemplate(image_object, TM_TYPE.CV_TM_CCOEFF_NORMED))
        {
            Point[] MAX_Loc, Min_Loc;
            double[] min, max;
            //Limit ROI to look for Match

            result_Matrix.ROI = new Rectangle(image_object.Width, image_object.Height, Area_Image.Width - image_object.Width, Area_Image.Height - image_object.Height);

            result_Matrix.MinMax(out min, out max, out Min_Loc, out MAX_Loc);

            Location = new Point((MAX_Loc[0].X), (MAX_Loc[0].Y));
            success = true;
            Results =result_Matrix.Convert<Gray,Double>();

        }
    }
    return success;
}

The thing most people forget is to pad the array with zeros akin to the size of the template we use zeros as this has no effect on the fft method. We pad the matrix else we don't process the data around the edge properly and we can miss matching items.

The second point and I cant stress how important this is is that the FFT method will at the moment return a match to the objects top left hand corner. result_Matrix.MinMax finds the place in which the object is most likely to have matched. There is a lot that you will need to experiment with so any more problems ask here or EMGU and I'll help when I can. I will copy and paste this solution over as well.

The Method in the Video

Well I will leave you to code most of this as I am stuck for time, but in effect the user uses the click event of a paintbox to find set e.X and e.Y location of an object within the image. The template is of a fixed sized so 100x100

Image<Gray, Byte> template_img = Main_Image.Copy(new Rectangle(x, y, 100, 100);

He then sets an ROI on the original image around the object this accounts for movement. In our case say we want a buffer (ROI) around the template of 50 pixels. This would equate to an intial ROI of:

Main_Image.ROI = new Rectangle(x - 50, y - 50, 200, 200);

Now since working with an ROI of an image we can slow down the processing as well as mess up displaying the original image again so it would be much better to do something like this:

using( Image<Gray, Byte> img_ROI = Main_Image.Copy(new Rectangle(x - 50, y - 50, 200, 200))
{
    Detect_object(img_ROI, template_img)
}

We use a using statement as this disposes of the extra image data when we've finished and frees up resources.

Now for the trick the ROI is actually controlled by the results from the Detect_object which is why we keep Location as a global variable. Once Location we have matched the template successfully our using statement will look more like:

using( Image<Gray, Byte> img_ROI = Main_Image.Copy(new Rectangle(Location.X - 50, Location.Y - 50, 200, 200)) 
{
    ...
}

That's pretty much it other than rectangles of the ROI and template, size and location are drawn on the image if you have problems with that let me know but the code should readily be out there,

Cheers,

Chris

18,155

Author by

Peter

Updated on August 29, 2022

Comments

Peter over 1 year

I am building an object tracking program that should track the unknown object. The user must select a region in the live video stream that should be tracked. My project is similar to this video.

http://www.youtube.com/watch?v=G5GLIKIkd6E

I have tried a method but it is not robust enough and the tracker moves a lot. So I am starting from scratch again.

Anyone knows a method on how I can come up with the one in the video? I am a newbie in emgucv and as of now I really have no idea where to start again.
Chris about 10 years

Look at the top comment of the video from Trashlock "@Computer22Nerd Well it's mostly just template matching. Except it is bound to a small region instead of doing it on the entire image. Basically the green box is the tracked object, and the red box is the tracking area (the area that will be tested for the template). When I click with my mouse, a 40x40 area is defined as the template. A 80x80 area is defined as the search area. The key to this system is that the template is updated each frame, making it possible to track evolving objects"
mevatron about 10 years

@Chris Good to know. There are definitely better ways than just straight up brute force template creation each frame. The demonstrated method is also prone to drift, which occurs a lot in the video. I'll see if I can find the paper I've used for this kind of stuff, and post it. Also, hopefully you don't think I down voted you because you've got a quality answer :)
Chris about 10 years

Hey thats fine if you did but there's often confusion between methods and their abilities. I agree yes there are certainly better methods but camshift would not be appropriate by itself as (despite the minimised ROI) the whole hand is moving i.e. lots of noise. You would need to include appropriate contour analysis to find the tip of the finger in each frame to have a more reliable reference point. For others in EMGU/opencv this would be under background subtraction/projection methods and 'Structural Analysis and Shape Descriptors' Cheers
e_phi almost 10 years

I tried tracing the code and am a bit confused as to what most of the methods in detect object do for one, but also how the bool success works...