Implementing and ploting a perceptron in MATLAB

41,050

Solution 1

You should first understand what is the meaning of each of the inputs:

X is the input matrix of examples, of size M x N, where M is the dimension of the feature vector, and N the number of samples. Since the perceptron model for prediction is Y=w*X+b, you have to supply one extra dimension in X which is constant, usually set to 1, so the b term is "built-in" into X. In the example below for X, I set the last entry of X to be 1 in all samples.
Y is the correct classification for each sample from X (the classification you want the perceptron to learn), so it should be a N dimensional row vector - one output for each input example. Since the perceptron is a binary classifier, it should have only 2 distinct possible values. Looking in the code, you see that it checks for the sign of the prediction, which tells you that the allowed values of Y should be -1,+1 (and not 0,1 for example).
w is the weight vector you are trying to learn.

So, try to call the function with:

X=[0 0; 0 1; 1 1];
Y=[1 -1];
w=[.5; .5; .5];

EDIT

Use the following code to call the perceptron alg and see the results graphically:

% input samples
X1=[rand(1,100);rand(1,100);ones(1,100)];   % class '+1'
X2=[rand(1,100);1+rand(1,100);ones(1,100)]; % class '-1'
X=[X1,X2];

% output class [-1,+1];
Y=[-ones(1,100),ones(1,100)];

% init weigth vector
w=[.5 .5 .5]';

% call perceptron
wtag=perceptron(X,Y,w);
% predict
ytag=wtag'*X;


% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')

plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')

Solution 2

If you are interested, here is a little perceptron demo written in quite a tutorial manner:

function perceptronDemo
%PERCEPTRONDEMO
%
%   A simple demonstration of the perceptron algorithm for training
%   a linear classifier, made as readable as possible for tutorial
%   purposes. It is derived from the treatment of linear learning
%   machines presented in Chapter 2 of "An Introduction to Support
%   Vector Machines" by Nello Cristianini and John Shawe-Taylor.
%
%

    Data  = createTrainingData;
    Model = trainPerceptron( Data );

end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Model = trainPerceptron( Data )
%TRAINPERCEPTRON

    DOWN   = 1;
    ACROSS = 2;

    assert( isequal( unique( Data.labels ), [-1; +1] ), ...
        'Labels must be -1 or +1' );

    % ---------------------------------------------------------------------
    % Normalise the data by calculating z-scores
    %
    %   This makes plotting easier, but is not needed by the algorithm.
    %

    sampleMean   = mean( Data.samples );
    sampleStdDev = std(  Data.samples );
    Data.samples = bsxfun( @minus,   Data.samples, sampleMean   );
    Data.samples = bsxfun( @rdivide, Data.samples, sampleStdDev );

    % ---------------------------------------------------------------------
    % Calculate the squared radius of the smallest ball that encloses the
    % data and is centred on the origin. This is used to provide an
    % appropriate range and step size when updating the threshold (bias)
    % parameter.
    %

    sampleSize = size( Data.samples, DOWN );
    maxNorm    = realmin;
    for iObservation = 1:sampleSize
        observationNorm = norm( Data.samples(iObservation,:) );
        if observationNorm > maxNorm
            maxNorm = observationNorm;
        end
    end
    enclosingBallRadius        = maxNorm;
    enclosingBallRadiusSquared = enclosingBallRadius .^ 2;

    % ---------------------------------------------------------------------
    % Define the starting weight vector and bias. These should be zeros,
    % as the algorithm omits a learning rate, and it is suggested in
    % Cristianini & Shawe-Taylor that learning rate may only be omitted
    % safely when the starting weight vector and bias are zero.
    %

    Model.weights = [0.0 0.0];
    Model.bias    = 0.0;

    % ---------------------------------------------------------------------
    % Run the perceptron training algorithm
    %
    %   To prevent program running forever when nonseparable data are
    %   provided, limit the number of steps in the outer loop.
    %

    maxNumSteps = 1000;

    for iStep = 1:maxNumSteps

        isAnyObsMisclassified = false;

        for iObservation = 1:sampleSize;

            inputObservation = Data.samples( iObservation, : );
            desiredLabel     = Data.labels(  iObservation    ); % +1 or -1

            perceptronOutput = sum( Model.weights .* inputObservation, ACROSS ) + Model.bias;
            margin           = desiredLabel * perceptronOutput;

            isCorrectLabel   = margin > 0;

            % -------------------------------------------------------------
            % If the model misclassifies the observation, update the
            % weights and the bias.
            %

            if ~isCorrectLabel

                isAnyObsMisclassified = true;

                weightCorrection = desiredLabel  * inputObservation;
                Model.weights    = Model.weights + weightCorrection;

                biasCorrection   = desiredLabel .* enclosingBallRadiusSquared;
                Model.bias       = Model.bias   + biasCorrection;

                displayPerceptronState( Data, Model );

            end % if this observation misclassified.

        end % loop over observations

        if ~isAnyObsMisclassified
            disp( 'Done!' );
            break;
        end

    end % outer loop

end % TRAINPERCEPTRON
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Data = createTrainingData
%CREATETRAININGDATA
%
%   Return a structure containing training data suitable for linear
%   classification.
%

    sampleAsize   = 1024;
    sampleBsize   = 1024;

    sampleAmean   = [ 5.5 5.0 ];
    sampleAstdDev = [ 0.5 1.0 ];

    sampleBmean   = [ 2.5 3.0 ];
    sampleBstdDev = [ 0.3 0.7 ];

    Data.samples  = [ normallyDistributedSample( sampleAsize, sampleAmean, sampleAstdDev ); ...
                      normallyDistributedSample( sampleBsize, sampleBmean, sampleBstdDev ) ];

    Data.labels   = [  ones(sampleAsize,1); ...
                      -ones(sampleBsize,1) ];

    % ---------------------------------------------------------------------
    % Randomly permute samples & class labels.
    %
    %   This is not really necessary, but done to illustrate that the order
    %   in which observations are evaluated does not matter.
    %

    randomOrder   = randperm( sampleAsize + sampleBsize );
    Data.samples  = Data.samples( randomOrder, : );
    Data.labels   = Data.labels(  randomOrder, : );

end % CREATETRAININGDATA
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function samples = normallyDistributedSample( sampleSize, sampleMean, sampleStdDev )
%NORMALDISTRIBUTIONSAMPLE
%
%   Draw a sample from a normal distribution with specified mean and
%   standard deviation.
%

    assert(    isequal( size( sampleMean ), size( sampleStdDev ) ) ...
            && 1 == size( sampleMean, 1 ),                         ...
        'Sample mean and standard deviation must be row vectors of equal length.' );

    numFeatures = numel( sampleMean );
    samples     = randn( sampleSize, numFeatures );
    samples     = bsxfun( @times, samples, sampleStdDev );
    samples     = bsxfun( @plus,  samples, sampleMean   );

end % NORMALDISTRIBUTIONSAMPLE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function displayPerceptronState( Data, Model )
%DISPLAYPERCEPTRONSTATE

    hFig = figure( 1 );
    clf;
    set( hFig,                        ...
        'NumberTitle', 'off',         ...
        'Name',         mfilename,    ...
        'MenuBar',      'none',       ...
        'Color',        [1.0 1.0 1.0] );

    displayXmin = -4;
    displayXmax =  4;
    displayYmin = -4;
    displayYmax =  4;

    hAx = subplot( 1, 1, 1 );
    axis('equal');
    set( hAx,                                  ...
        'Box',      'on',                      ...
        'NextPlot', 'add',                     ...
        'xgrid',    'on',                      ...
        'ygrid',    'on',                      ...
        'xlim',     [displayXmin displayXmax], ... % Bounds suitable for Z-scored data
        'ylim',     [displayYmin displayYmax]  );
    xlabel( 'x_1' );
    ylabel( 'x_2' );

    % ---------------------------------------------------------------------
    % Plot data points from the two classes
    %

    isPositiveClass = Data.labels >  0;
    isNegativeClass = Data.labels <= 0;

    plot( hAx, Data.samples(isPositiveClass,1), Data.samples(isPositiveClass,2), 'b+' );
    plot( hAx, Data.samples(isNegativeClass,1), Data.samples(isNegativeClass,2), 'rx' );

    % ---------------------------------------------------------------------
    % Display parameters for separating hyperplane in title
    %

    xWeight   = Model.weights(1);
    yWeight   = Model.weights(2);
    bias      = Model.bias;

    szTitle   = sprintf( 'Linear classifier parameters: %0.2f x_1 + %0.2f x_2 + %0.2f = 0', xWeight, yWeight, bias );
    title( szTitle );

    % ---------------------------------------------------------------------
    % Plot separating hyperplane
    %

    y1 = ( (xWeight*displayXmin) + bias ) ./ -yWeight;
    y2 = ( (xWeight*displayXmax) + bias ) ./ -yWeight;

    plot( hAx, [displayXmin; displayXmax], [y1, y2], 'k-', 'linewidth', 2 );

    pause(0.1);

end % DISPLAYPERCEPTRONSTATE

Solution 3

try this:

perceptron([1 2 1 2], [1 0 1 0], 0.5);

41,050

Author by

edgarmtze

Updated on July 28, 2022

Comments

edgarmtze over 1 year

I´m reviewing a code from Toronto perceptron MATLAB code

The code is

function [w] = perceptron(X,Y,w_init)

w = w_init;
for iteration = 1 : 100  %<- in practice, use some stopping criterion!
  for ii = 1 : size(X,2)         %cycle through training set
    if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
      w = w + X(:,ii) * Y(ii);   %then add (or subtract) this point to w
    end
  end
  sum(sign(w'*X)~=Y)/size(X,2)   %show misclassification rate
end

So I was reading how to apply this function to data matrix X, and target Y, but, do not know how to use this function, I understand, it returns a vector of weights, so it can classify.

Could you please give an example, and explain it??

I´ve tried

X=[0 0; 0 1; 1 1]
Y=[1 0; 2 1]
w=[1 1 1]
Result = perceptron( X, Y, w )

??? Error using ==> mtimes
Inner matrix dimensions must agree.

Error in ==> perceptron at 15
            if sign(w'*X(:,ii)) ~= Y(ii) 

    Result = perceptron( X, Y, w' )

??? Error using ==> ne
Matrix dimensions must agree.

Error in ==> perceptron at 19
        sum(sign(w'*X)~=Y) / size(X,2);

Thanks

Thank you for the anwers, I got one more, If I change the Y = [0, 1], what happens to the algorithm?.

So, Any input data will not work with Y = [0,1] with this code of the perceptron right?,

-----------------------------EDIT------------------------

One more question, if I want to plot the line that divides the 2 classes, I know we can get that the line solving linear equation system that has to do with weights, but how, what could I do?, I'm trying something like

% the initial weights
w_init = [ 1 1 1]';  
% the weights returned from perceptron    
wtag   = perceptron(X,Y,w_init,15);

% concatenate both
Line = [wtag,w_init] 

% solve the linear system, am I correct doing this?
rref(Line')

% plot???

Itamar Katz about 13 years

Your example won't work, since the algorithm assumes output values of [-1,+1], not [0,1]. The w vector won't be updated at all.
Itamar Katz about 13 years

Also, the input should be at least of dimension 2, otherwise you explicitly assuming that b=0 in y=a*x+b
edgarmtze about 13 years

Thank you very much, i really understand your example, but have one more question: What would you if the class 1 has more examples than class 0?? in the example you provided there are the same number of examples for both classes, X1 and X2
edgarmtze about 13 years

Is this correct, I can not test it right now: X1=[rand(1,100);rand(1,100);ones(1,100)]; % class '+1' X2=[rand(1,300);1+rand(1,300);ones(1,300)]; % class '-1' X=[X1,X2]; % output class [-1,+1]; Y=[-ones(1,100),ones(1,300)]; % init weigth vector w=[.5 .5 .5]'; wtag=perceptron(X,Y,w);
Itamar Katz about 13 years

The number of examples is irrelevant, I just chose equal number for each class because of convenience. Nothing has to Change. The size of w should be the dimension of the samples (including the constant term), since any prediction is based on the value of the dot product w*x - so w and x should have the same size.
Itamar Katz about 13 years

Yes, this seems right. In addition, it's only a synthetic example. In a real-world example your data is mixed, and you are not concerned with the size of each class - only ensure you have enough examples from each class to cover the sample space.
Predictor about 13 years

You are correct: the balance of the classes should not matter. The perceptron has issues with linear separability and stability. The prior probabilities of the classes matters in the calculation of other linear tools, like linear discriminant analysis.
edgarmtze about 13 years

One question, If you change Y to [0, 1] what could happen??
Itamar Katz about 13 years

If Y is [0,1] then sign(Y) is always the same (at least in Matlab), and the update rule for w won't work, since sign(Y) determines the class for each sample.
edgarmtze about 13 years

So, Any input data will not work with Y = [0,1] with this code of the perceptron right?,