Matlab - Neural network training
Solution 1
This is normal. Your output layer is using a log-sigmoid transfer function, and that will always give you some intermediate output between 0 and 1.
What you would usually do would be to look for the output with the largest value -- in other words, the most likely character.
This would mean that, for every column in y2
, you're looking for the index of the row that contains the largest value in that row. You can compute this as follows:
[dummy, I]=max(y2);
I
is then a vector containing the indexes of the largest value in each row.
Solution 2
You can think of y2 as an output probability distribution for each input being one of the 26 alphabet characters, for example if one column of y2 says:
.2
.5
.15
.15
then its 50% probability that this character is B (if we assume only 4 possible outputs).
==REMARK==
The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.
It is preferable to avoid using target values of 0,1 to encode the output of the network.
The reason for avoiding target values of 0 and 1 is that 'logsig' sigmoid transfer function cannot produce these output values given finite weights. If you attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.
So instead of 0 and 1 values, try using values of 0.04 and 0.9 for example, so that [0.9,0.04,...,0.04] is the target output vector for the letter A.
Reference:
Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997, p114-115
Solution 3
- Use
hardlin fcn
in output layer.- Use
trainlm
ortrainrp
for training the network. - To learn your network, use a for loop and a condition that compare the output and target. When it is the best use, break to exit from the learning loop.
- Use another way instead of
mapminmax
for pre-processing data set.
- Use
sp.
Updated on July 01, 2022Comments
-
sp. almost 2 years
I'm working on creating a 2 layer neural network with back-propagation. The NN is supposed to get its data from a 20001x17 vector that holds following information in each row:
-The first 16 cells hold integers ranging from 0 to 15 which act as variables to help us determine which one of the 26 letters of the alphabet we mean to express when seeing those variables. For example a series of 16 values as follows are meant to represent the letter A: [2 8 4 5 2 7 5 3 1 6 0 8 2 7 2 7].
-The 17th cell holds a number ranging from 1 to 26 representing the letter of the alphabet we want. 1 stands for A, 2 stands for B etc.
The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.
Some things that are important before i present the code: I need to use the traingdm function and the hidden layer number is fixed (for now) at 21.
Trying to create the above concept i wrote the following matlab code:
%%%%%%%% %Start of code% %%%%%%%% % %Initialize the input and target vectors % p = zeros(16,20001); t = zeros(26,20001); % %Fill the input and training vectors from the dataset provided % for i=2:20001 for k=1:16 p(k,i-1) = data(i,k); end t(data(i,17),i-1) = 1; end net = newff(minmax(p),[21 26],{'logsig' 'logsig'},'traingdm'); y1 = sim(net,p); net.trainParam.epochs = 200; net.trainParam.show = 1; net.trainParam.goal = 0.1; net.trainParam.lr = 0.8; net.trainParam.mc = 0.2; net.divideFcn = 'dividerand'; net.divideParam.trainRatio = 0.7; net.divideParam.testRatio = 0.2; net.divideParam.valRatio = 0.1; %[pn,ps] = mapminmax(p); %[tn,ts] = mapminmax(t); net = init(net); [net,tr] = train(net,p,t); y2 = sim(net,pn); %%%%%%%% %End of code% %%%%%%%%
Now to my problem: I want my outputs to be as described, namely each column of the y2 vector for example should be a representation of a letter. My code doesn't do that though. Instead it produced results that vary greatly between 0 and 1, values from 0.1 to 0.9.
My question is: is there some conversion i need to be doing that i am not? Meaning, do i have to convert my input and/or output data to a form by which i can actually see if my NN is learning correctly?
Any input would be appreciated.
-
sp. over 14 yearsMartin, thanks for the response. Using max(y2) i can now atleast get some information on how many times the network was right on identifying the letters. What i did do however before feeding the network the data i have was scale it down so that 0<=p(x)<=1. Seeing as the minimum value of p was 0 and the maximum was 15 i made a new input vector scaledp = p/15.
-
sp. over 14 yearsI don't think that's correct. Each element of the output vector will have a value varying between 0.00 and 1.00 but the sum of any column (or any element in that column for that matter) will never actually represent a percentage.
-
Amro over 14 yearsAlternatively you can use the difference between the highest value in y2 and the second highest value as a measure of confidence of the prediction.
-
Roberto about 13 yearsIT's definitely not a probability distribution unless you're using a probabilistic neural network. It's not really even a confidence value, depending on the algorithm you're using and how you trained it.
-
Ben Allison about 11 yearsYou shouldn't use max as the activity function, because your error function should be defined over the activity not the activation, and max is non-differentiable, which means you can't use back-prop. You need softmax, see my answer below.