Bag of words training and testing opencv, matlab

14,802

Solution 1

Local features

When you work with SIFT, you usually want to extract local features. What does that means? You have your image and from this image you will locate points from which you will extract local feature vectors. A local feature vector is just a vector consisting of numerical values that describes the visual information of the image region from which it was extracted. Although the number of local feature vectors that you can extract from image A does not need to be the same as the number of feature vectors that you can extract from image B, the number components of a local feature vector (i.e. its dimensionality) is always the same.

Now, if you want to use your local feature vectors to classify images you have a problem. In traditional image classification, each image is described by a global feature vector, which, in the context of machine learning, can be seen as a set of numerical attributes. However, when you extract a set of local feature vectors you don't have a global representation of each image which is required for image classification. A technique that can be employed to solve this problem is the bag of words, also known as bag of visual words (BoW).

Bag of visual words

Here's the (very) simplified BoW algorithm:

  1. Extract the SIFT local feature vectors from your set of images;

  2. Put all this local feature vectors into a single set. At this point you don't even need to store from which image each local feature vector was extracted;

  3. Apply a clustering algorithm (e.g. k-means) over the set of local feature vectors in order to find centroid coordinates and assign an id to each centroid. This set of centroids will be your vocabulary;

  4. The global feature vector will be a histogram that counts how many times each centroid occurred in each image. To compute the histogram find the nearest centroid for each local feature vector.

Image Classification

Here I am assuming that your problem is the following:

You have as input a set of labeled images and a set of non-labeled images which you want to assign a label based on its visual appearance. Suppose your problem is to classify landscape photography. You image labels could be, for example, “mountains”, “beach” or “forest”.

The global feature vector extracted from each image (i.e. its bag of visual words) can be seen as a set of numerical attributes. This set of numerical attributes representing the visual characteristics of each image and the corresponding image labels can be used to train classifier. For example, you could use a data mining software such as Weka, which has an implementation of SVM, known as SMO, to solve your problem.

Basically, you only have to format the global feature vectors and corresponding image labels according to the ARFF file format, which is, basically, a CSV of global feature vectors followed by image label.

Solution 2

Here's a very good article introducing Bag of Words model for classification using OpenCV v2.2. http://app-solut.com/blog/2011/07/the-bag-of-words-model-in-opencv-2-2/

A follow-up article on using Normal Bayes Classifier for image categorization. http://app-solut.com/blog/2011/07/using-the-normal-bayes-classifier-for-image-categorization-in-opencv/

Also includes a ~200-line code demo on Caltech-256 dataset is available. http://code.google.com/p/open-cv-bow-demo/downloads/detail?name=bowdemo.tar.gz&can=2&q=

Here's something to get a intuitive feel of the process of Image Classification: http://www.robots.ox.ac.uk/~vgg/share/practical-image-classification.htm

Really helped me clarify a lot of questions. I hope it helps someone. :)

Share:
14,802
Mario
Author by

Mario

I'm Mario, I'm a student, I like programming but I'm not good at it. I'm trying to do my best to be a good programmer.

Updated on June 07, 2022

Comments

  • Mario
    Mario almost 2 years

    I'm implementing Bag Of Words in opencv by using SIFT features in order to make a classification for a specific dataset. So far, I have been apple to cluster the descriptors and generate the vocabulary. As I know, I have to train SVM ... but i have some questions which i'm really confused about them. The major problem is the concept behind the implementations, these are my questions:

    1- When I extract the features and then create the vocabulary, shall I extract the features for all the objects (let's say 5 objects)and put them in one file, so I make all of them in a one vocabulary file that has all the words? and how I will separate them later on when I do the classification?

    2- How to implement the SVM? I know the functions that are used in openCV but how?

    3- I can do the work in MATLAB, which I mean the implementation of the SVM training, but is there any code available that can guide me through my work? I have seen the code used by Andrea Vedaldi, here but he is working only with one class each time and another issue that he is not showing how to create the .mat file that he's using in his exercises. All other implementations that I could find, they are not using SVM. So, can you guide in this point too!

    Thank you

  • Alceu Costa
    Alceu Costa almost 12 years
    The answer is still incomplete. I will try to complete it tomorrow.
  • Mario
    Mario almost 12 years
    I'm waiting for you to finish it
  • Mario
    Mario almost 12 years
    So far, I have done 3 out of 4 steps, and I know that i have to create the histogram .. But I really need to know more about the training and the testing part for the classification,, it's really amazing what's you written so far..
  • Mario
    Mario over 11 years
    I didn't know you edit this for really long time, because I got no notification, it's such brilliant answer which declare to me so many question were going through my mind.... Anyways, I have few question to ask, they might be useful for me and other people..
  • Mario
    Mario over 11 years
    the questions, I have made what you said about the histogram, I have a create a file that contains a codebook words which is a vector of 100*128 from SIFT features for 10 images, i made it 100 words because i wanna do simple ones then update it later with my work,,, so my question is, these words let's say for object "car" can I train them and assume them as positive samples? "as very simple training only 2 classes and the second class will let's say motorbike".. so I have to make another histogram file and train against the first one?
  • Mario
    Mario over 11 years
    the second question, why I have to make ARFF file format, I can simply label it in matlab, the first histogram file is class "1" and the second is class "2", is that possible??