Using the Normal Bayes classifier for image categorization in OpenCV

This article is a follow up to “The Bag of Words model in OpenCV 2.2” in which I explained how to use the BoW classes in OpenCV to create BoW representations for images. Here I will explain how to use the Normal Bayes Classifier -which is also implemented in OpenCV- to categorize images.

In this article we treat image categorization as a supervised learning task. For those who don’t know what supervised learning is I will not try to give an introduction here. Please have a look at wikipedia.

The Normal Bayes Classifier is a very simple classifier which assumes that the class-conditional distribution of the data is normal. Consequently, what the classifier does, is estimate the covariance matrices and means per class. To classify an instance, it chooses the class under whose class-conditional density the instance has the highest probability.

Attached you can find a simple demo which you can use to experiment with different descriptors, feature detectors and BoW vector lengths. You need to have OpenCV 2.1 (or higher) installed, as well as a recent Boost version since in the demo program the boost filesystem package is used for traversing directory structures. The demo was tested on Ubuntu 11.04. Please let me know, if you can’t compile it on your system or have any other difficulties.

The dataset

The dataset used in the demo is a small part of the Caltech-256 dataset (Griffin, G. Holub, AD. Perona, P. The Caltech-256, Caltech Technical Report). The BoW method for image classification is not the most cutting edge technique and the performance decreases rapidly with increasing number of classes. Also, the classifier needs a lot of time for calculating the covariance matrices. That’s why the demo uses only six categories from Caltech-256. The Bonsai, Buddha, Cartman, Chopsticks, Homer Simpson and Porcupine categories. In each category 60 images are used for training and the remaining images for evaluating the classifier.

The feature extraction process

For detecting and extracting features the various implementations of the “DescriptorExtractor” and “FeatureExtractor” interfaces are used. In the demo the global pointers “extractor” and “detector” point to these implementations. The demo uses the SURF detector and descriptor as default. You can change this by replacing the arguments of the create methods with the string representation of the detectors or descriptors you want to try.

Ptr<DescriptorMatcher > matcher = DescriptorMatcher::create("FlannBased");
Pt<DescriptorExtractor > extractor = DescriptorExtractor::create("SURF");
Ptr<FeatureDetector > detector = FeatureDetector::create("SURF");

The BoW parameters are also global variables. They are “dictionarySize”, the termination criteria “tc”, the number of times the k-means algorithm should restart to find a good local minimum as well as the “flags” which are set to KMEANS_PP_CENTERS in the demo (see “The Bag of Words model in OpenCV 2.2” for details).

int dictionarySize = 200;
TermCriteria tc(CV_TERMCRIT_ITER, 10, 0.001);
int retries = 1;
int flags = KMEANS_PP_CENTERS;

There are also two helper functions for traversing the folders in which the dataset is located. The method named “extractTrainingVocabulary” extracts descriptors from the training images to create the BoW dictionary and the method named “extractBOWDescriptor” method creates BoW descriptors for each of the images in the training partition or evaluation partition of the dataset.

The classifier

OpenCV has a machine learning module which implements a couple of machine learning algorithms. Most of these algorithms have not much in common and therefore the common base interface (CvStatModel) they are derived from has only a few methods. These methods are for saving and loading the trained classifiers. The two other important methods are “train” and “predict” which are used for training the classifier and prediction the class label of new instances. These two methods have very different signatures for most of the classifiers and are not part of the common base interface. The “NormalBayes” classifier is one of the most easy to use classifiers, since it does not have any meta parameters which have to be tuned. Have a look at this code-snippet.

Mat trainingData;
Mat trainingLabels;
Mat evalData;
Mat results;

NormalBayesClassifier classifier;
//Train classifier...
classifier.train(trainingData, trainingLabels);
.
.
.
//Evaluate classifier...
classifier.predict(evalData,&results);

As you can see, all you have to do is supply the training data (each row, one instance) and the training labels to the “train” method. The “predict” method is equally simple to use. If you want more details on how everything fits together, just have a look at the demo (less than 200 lines of code).

 

2 comments

  1. Thank you for your wonderful articles and in particular for your simple demo sources. This helped me a lot to get a brief insight into “bag of words” of OpenCV. Unfortunately I wasted some time to look into the example from the samples folder in OpenCV (the “bagofwords_classification.cpp”) which I think is a much worse example than yours, because it has over 2500 lines of code with much file handling overhead, actually not necessary to understand the basics of BOW implementation in OpenCV.

    I think it would be of huge value for other people if you could contribute the sources directly into the OpenCV repository (resp. community).

  2. Thanks for your useful topic. After detector->detect(img, keypoint); detects keypoints, when i want to clean keypoints using keypoint.clear(); or when the function wants to return the following error will be appeared.

    “Unhandled exception at 0x011f45bb in BOW.exe: 0xC0000005: Access violation reading location 0x42ebe098.”

    and also detected keypoints have bizarre points coordinates like cv::Point_ pt{x=-1.5883997e+038y=-1.5883997e+038 }

    Part of the code

    Ptr matcher = DescriptorMatcher::create(“FlannBased”);
    Ptr extractor = new SurfDescriptorExtractor();
    Ptr detector = new SurfFeatureDetector(2000);
    void extractTrainingVocabulary() {
    IplImage *img;
    int i,j;
    CvSeq *imageKeypoints = 0;
    for(j=1;j<=60;j++)
    for(i=1;i<=60;i++){
    sprintf( ch,"%d%s%d%s",j," (",i,").jpg");
    const char* imageName = ch;
    Mat img = imread(ch);
    vector keypoint;
    detector->detect(img, keypoint);
    Mat features;
    extractor->compute(img, keypoint, features);
    bowTrainer.add(features);
    keypoint.clear();//problem
    }
    return;
    }