KNN algo in matlab

23,675

Solution 1

Here is an illustration code for k-nearest neighbor classification (some functions used require the Statistics toolbox):

%# image size
sz = [25,42];

%# training images
numTrain = 200;
trainData = zeros(numTrain,prod(sz));
for i=1:numTrain
    img = imread( sprintf('train/image_%03d.jpg',i) );
    trainData(i,:) = img(:);
end

%# testing images
numTest = 200;
testData = zeros(numTest,prod(sz));
for i=1:numTest
    img = imread( sprintf('test/image_%03d.jpg',i) );
    testData(i,:) = img(:);
end

%# target class (I'm just using random values. Load your actual values instead)
trainClass = randi([1 5], [numTrain 1]);
testClass = randi([1 5], [numTest 1]);

%# compute pairwise distances between each test instance vs. all training data
D = pdist2(testData, trainData, 'euclidean');
[D,idx] = sort(D, 2, 'ascend');

%# K nearest neighbors
K = 5;
D = D(:,1:K);
idx = idx(:,1:K);

%# majority vote
prediction = mode(trainClass(idx),2);

%# performance (confusion matrix and classification error)
C = confusionmat(testClass, prediction);
err = sum(C(:)) - sum(diag(C))

Solution 2

If you want to compute the Euclidean distance between vectors a and b, just use Pythagoras. In Matlab:

dist = sqrt(sum((a-b).^2));

However, you might want to use pdist to compute it for all combinations of vectors in your matrix at once.

dist = squareform(pdist(myVectors, 'euclidean'));

I'm interpreting columns as instances to classify and rows as potential neighbors. This is arbitrary though and you could switch them around.

If have a separate test set, you can compute the distance to the instances in the training set with pdist2:

dist = pdist2(trainingSet, testSet, 'euclidean')

You can use this distance matrix to knn-classify your vectors as follows. I'll generate some random data to serve as example, which will result in low (around chance level) accuracy. But of course you should plug in your actual data and results will probably be better.

m = rand(nrOfVectors,nrOfFeatures); % random example data
classes = randi(nrOfClasses, 1, nrOfVectors); % random true classes
k = 3;  % number of neighbors to consider, 3 is a common value

d = squareform(pdist(m, 'euclidean')); % distance matrix
[neighborvals, neighborindex] = sort(d,1); % get sorted distances

Take a look at the neighborvals and neighborindex matrices and see if they make sense to you. The first is a sorted version of the earlier d matrix, and the latter gives the corresponding instance numbers. Note that the self-distances (on the diagonal in d) have floated to the top. We're not interested in this (always zero), so we'll skip the top row in the next step.

assignedClasses = mode(neighborclasses(2:1+k,:),1);

So we assign the most common class among the k nearest neighbors!

You can compare the assigned classes with the actual classes to get an accuracy score:

accuracy = 100 *  sum(classes == assignedClasses)/length(classes);
fprintf('KNN Classifier Accuracy: %.2f%%\n', 100*accuracy)

Or make a confusion matrix to see the distribution of classifications:

confusionmat(classes, assignedClasses)

Solution 3

yes, there is a function for knn : knnclassify

Play around with the number of neighbors you want to keep in order to get the best result (use a confusion matrix). This function takes care of the distance, of course.

Share:
23,675
Muaz Usmani
Author by

Muaz Usmani

Updated on January 03, 2020

Comments

  • Muaz Usmani
    Muaz Usmani over 4 years

    I am working on thumb recognition system. I need to implement KNN algorithm to classify my images. according to this, it has only 2 measurements, through which it is calculating the distance to find the nearest neighbour but in my case I have 400 images of 25 X 42, in which 200 are for training and 200 for testing. I am searching for few hours but I am not finding the way to find the distance between the points.

    EDIT: I have reshaped 1st 200 images in to 1 X 1050 and stored them in a matrix trainingData of 200 X 1050. similarly I made testingData.

    • Gunther Struyf
      Gunther Struyf almost 12 years
      can't open your link, if you search for 'upload file' you'll find plenty of alternatives for hosting
    • Muaz Usmani
      Muaz Usmani almost 12 years
      I want to find the distances between the points so I can apply k-nn algo.
  • Muaz Usmani
    Muaz Usmani almost 12 years
    is there any function for knn ? actually I want to train my system
  • Junuxx
    Junuxx almost 12 years
    You "train" (not necessary actually, unless you want to know and compare the performance on the training set) KNN by calculating the distances. You compute all the pairwise distances, then you find the K instances nearest (lowest distance) to the instance you want to classify. Assign the most common class among these neighbors to the instance.
  • Junuxx
    Junuxx almost 12 years
    Well, I expanded my answer with an explanation of the entire knn process. And without any for loops!
  • Amro
    Amro almost 12 years
    @Junuxx: when you have separate train/test data, you should use PDIST2 to compute all pairwise distances between points in the test set against point in the training set
  • Junuxx
    Junuxx almost 12 years
    @Amro: Good suggestion, wasn't aware of pdist2 but I'll update my answer :)
  • Junuxx
    Junuxx almost 12 years
    Doesn't really answer the question how to find the distances and doesn't clarify how knn works either, but otherwise a nice and simple solution :)
  • CTZStef
    CTZStef almost 12 years
    KNN is the simplest machine learning algorithm! K for "how much closest neighbors to keep around the individual you consider", keep the class which is the more present among those neighbors, and the distance, basically it is euclidean distance... beside, user1420026 explicitely asked for a "function for knn".
  • Junuxx
    Junuxx almost 12 years
    To be honest, OP didn't ask for a knn function clearly in the question, only in a later comment. But unless this is homework or some learning project, knnclassify is probably the most convenient thing for OP to use. So +1 for useful function and link with examples :)
  • Muaz Usmani
    Muaz Usmani almost 12 years
    Thank you Sir. I told you I have trainingData of order 200 X 1050. which means 200 are total images and 1050 are dimensions of image (which is actually 25 X 42). my question to you is how I can replace trainClass = randi([1 5], [numTrain 1]); with my code.
  • Amro
    Amro almost 12 years
    @user1420026: those are the class targets (label of each instance) which must be giving when performing classification (supervised learning)..
  • Muaz Usmani
    Muaz Usmani almost 12 years
    this is my label data labelData = zeros(200,1); labelData(1:100,:) = 0; labelData(101:200,:) = 1;. So how to use it here ?
  • Amro
    Amro almost 12 years
    @user1420026: those are exactly the labels of the training data: trainData = labelData;. Then do the same for the testing data (if you have them -- test labels are only required if you want to measure the performance of the classifier as I did in the part of the code)