Backpropagation algorithm (Matlab): output values are saturating to 1

10,906

The sigmoid function is limited to the range (0,1) so it will never hit your target values (since they are all greater than 1). You should scale your target values so the are also in the range of the sigmoid. Since you know your target values are constrained to the range (0,100), just divide them all by 100.

Share:
10,906
JDS
Author by

JDS

Updated on June 05, 2022

Comments

  • JDS
    JDS almost 2 years

    I have coded up a backpropagation algorithm in Matlab based on these notes: http://dl.dropbox.com/u/7412214/BackPropagation.pdf

    My network takes input/feature vectors of length 43, has 20 nodes in the hidden layer (arbitrary parameter choice I can change), and has a single output node. I want to train my network to take the 43 features and output a single value between 0 and 100. The input data was normalized to zero mean and unit standard deviation (via z = x - mean / std) and then I appended a "1" term to input vectors to represent a bias. My targetValues are just single numbers between 0 and 100.

    Here is the relevant parts of my code:

    (By my convention, layer I (i) refers to the input layer, J (j) refers to the hidden layer, and K (k) refers to the output layer, which is a single node in this case.)

    for train=1:numItrs
            for iterator=1:numTrainingSets
    
                %%%%%%%% FORWARD PROPAGATION %%%%%%%%
    
                % Grab the inputs, which are rows of the inputFeatures matrix
                InputLayer = inputFeatures(iterator, :)'; %don't forget to turn into column 
                % Calculate the hidden layer outputs: 
                HiddenLayer = sigmoidVector(WeightMatrixIJ' * InputLayer); 
                % Now the output layer outputs:
                OutputLayer = sigmoidVector(WeightMatrixJK' * HiddenLayer);
    
                %%%%%%% Debug stuff %%%%%%%% (for single valued output)
                if (mod(train+iterator, 100) == 0)
                   str = strcat('Output value: ', num2str(OutputLayer), ' | Test value: ', num2str(targetValues(iterator, :)')); 
                   disp(str);
                end 
    
    
    
    
                %%%%%%%% BACKWARDS PROPAGATION %%%%%%%%
    
                % Propagate backwards for the hidden-output weights
                currentTargets = targetValues(iterator, :)'; %strip off the row, make it a column for easy subtraction
                OutputDelta = (OutputLayer - currentTargets) .* OutputLayer .* (1 - OutputLayer); 
                EnergyWeightDwJK = HiddenLayer * OutputDelta'; %outer product
                % Update this layer's weight matrix:
                WeightMatrixJK = WeightMatrixJK - epsilon*EnergyWeightDwJK; %does it element by element
    
                % Propagate backwards for the input-hidden weights
                HiddenDelta = HiddenLayer .* (1 - HiddenLayer) .* WeightMatrixJK*OutputDelta; 
                EnergyWeightDwIJ = InputLayer * HiddenDelta'; 
                WeightMatrixIJ = WeightMatrixIJ - epsilon*EnergyWeightDwIJ; 
    
            end
    
        end
    

    And the weight matrices are initialized as follows:

    WeightMatrixIJ = rand(numInputNeurons, numHiddenNeurons) - 0.5; 
    WeightMatrixJK = rand(numHiddenNeurons, numOutputNeurons) - 0.5; 
    %randoms b/w (-0.5, 0.5)
    

    The "sigmoidVector" function takes every element in a vector and applies y = 1 / (1 + exp(-x)).

    Here's what the debug messages look like, from the start of the code:

    Output value:0.99939 | Test value:20
    Output value:0.99976 | Test value:20
    Output value:0.99985 | Test value:20
    Output value:0.99989 | Test value:55
    Output value:0.99991 | Test value:65
    Output value:0.99993 | Test value:62
    Output value:0.99994 | Test value:20
    Output value:0.99995 | Test value:20
    Output value:0.99995 | Test value:20
    Output value:0.99996 | Test value:20
    Output value:0.99996 | Test value:20
    Output value:0.99997 | Test value:92
    Output value:0.99997 | Test value:20
    Output value:0.99997 | Test value:20
    Output value:0.99997 | Test value:20
    Output value:0.99997 | Test value:20
    Output value:0.99998 | Test value:20
    Output value:0.99998 | Test value:20
    Output value:0.99999 | Test value:20
    Output value:0.99999 | Test value:20
    Output value:1 | Test value:20
    Output value:1 | Test value:62
    Output value:1 | Test value:70
    Output value:1 | Test value:77
    Output value:1 | Test value:20
    ** stays saturated at 1 **
    

    Obviously I'd like the network to train output values to be between 0 and 100 to try and match those target values!

    Thank you for any help, if you need more information I'll provide all I can.