Programming a Basic Neural Network from scratch in MATLAB

27,585

Your Neural network seems alright, although the kind of training you're trying to do is quite in-efficient if you're training against labeled data as you're doing. In that case I would suggest looking into Back-propagation

About your error when training: Your error message hints at the problem: dimensions are not consistent

As argument x0 in fminsearch which is the initial guess for the optimizer, you send [W1, W2] but from what I can see, these matrices don't have the same number of rows, and therefore you can't add them together like that. I would suggest modifying your cost-function to take a vector as argument and then form your weight-vectors for different layers from that one vector.

You are also not supplying the cost-function correctly to fminsearch as you are just evaluating cost with w1, w2, Xtrain and Ytrain in-place.

According to the documentation (it's been years since I used Matlab) it seems like you pass the pointer to the cost-function as fminsearch(cost, [W1; W2])

EDIT: You could express your weights and modify your code as follows:

global Xtrain
global Ytrain
W = [W1; W2]
fminsearch(cost, W)

Cost-function must be modified such that it doesn't take Xtrain, Ytrain as input because fminsearch will then try to optimize those too. Modify your cost-function like this:

function [ C ] = cost( W )
   W1 = W[1:2,:]
   W2 = W[3,:]
   global Xtrain
   global Ytrain
   ...

27,585

Author by

Blue7

Updated on July 09, 2022

Comments

Blue7 almost 2 years

I have asked a few questions about neural networks on this website in the past and have gotten great answers, but I am still struggling to implement one for myself. This is quite a long question, but I am hoping that it will serve as a guide for other people creating their own basic neural networks in MATLAB, so it should be worth it.

What I have done so far could be completely wrong. I am following the online stanford machine learning course by Professor Andrew Y. Ng and have tried to implement what he has taught to the best of my ability.

Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?

I have a feed 2 layer feed forward neural network.

The MATLAB code for the feedforward part is:

function [ Y ] = feedforward2( X,W1,W2)
%This takes a row vector of inputs into the neural net with weight matrices W1 and W2 and returns a row vector of the outputs from the neural net

%Remember X, Y, and A can be vectors, and W1 and W2 Matrices 

X=transpose(X);            %X needs to be a column vector
A = sigmf(W1*X,[1 0]);     %Values of the first hidden layer  
Y = sigmf(W2*A,[1 0]);     %Output Values of the network
Y = transpose(Y);          %Y needs to be a column vector

So for example a two layer neural net with two inputs and two outputs would look a bit like this:

      a1
x1 o--o--o y1      (all weights equal 1)
    \/ \/
    /\ /\
x2 o--o--o y2
      a2

if we put in:

X=[2,3];
W1=ones(2,2);
W2=ones(2,2);

Y = feedforward2(X,W1,W2)

we get the the output:

Y = [0.5,0.5]

This represents the y1 and y2 values shown in the drawing of the neural net

The MATLAB code for the squared error cost function is:

function [ C ] = cost( W1,W2,Xtrain,Ytrain )
%This gives a value seeing how close W1 and W2 are to giving a network that represents the Xtrain and Ytrain data
%It uses the squared error cost function
%The closer the cost is to zero, the better these particular weights are at giving a network that represents the training data
%If the cost is zero, the weights give a network that when the Xtrain data is put in, The Ytrain data comes out

M = size(Xtrain,1);  %Number of training examples

oldsum = 0;

for i = 1:M,
        H = feedforward2(Xtrain,W1,W2); 
        temp = ( H(i) - Ytrain(i) )^2;
        Sum = temp + oldsum;
        oldsum = Sum;
end

C = (1/2*M) * Sum;

end

Example

So for example if the training data is:

Xtrain =[0,0;        Ytrain=[0/57;
        1,2;           3/57;
        4,1;           5/57;
        5,2;           7/57;                                                           a1    
        3,4;           7/57;    %This will be for a two input one output network  x1 o--o y1
        5,3;           8/57;                                                          \/ \_o 
        1,5;           6/57;                                                          /\ /
        6,2;           8/57;                                                      x2 o--o      
        2,1;           3/57;                                                           a2    
        5,5;]          10/57;]

We start with initial random weights

W1=[2,3;     W2=[3,2]
    4,1]

If we put in:

Y= feedforward2([6,2],W1,W2)

We get

Y = 0.9933

Which is far from what the training data says it should be (8/57 = 0.1404). So the initial random weights W1 and W2 where a bad guess.

To measure exactly how bad/good a guess the random weights weights are we use the cost function:

C= cost(W1,W2,Xtrain,Ytrain)

This gives the value:

C = 6.6031e+003

Minimizing the cost function

If we minimize the cost function by searching all of the possible variables W1 and W2 and then picking the lowest, this will give the network that best approximates the training data

But when I Use the code:

 [W1,W2]=fminsearch(cost(W1,W2,Xtrain,Ytrain),[W1,W2])

It gives an error message. It says: "Error using horzcat. CAT arguments dimensions are not consistent."Why am I getting this error and what can I do to fix it?

Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?

Thank you!!!

Blue7 about 10 years

Thank you for your answer. "modifying your cost-function to take a vector as argument and then form your weight-vectors for different layers from that one vector." How would I do this? I don't know how to change from a sum to a vector equation, and I do not understand what you mean by "form your weight-vectors for different layers from that one vector."