Things to try when Neural Network not Converging

25,725

Solution 1

If you are using ReLU activations, you may have a "dying ReLU" problem. In short, under certain conditions, any neuron with a ReLU activation can be subject to a (bias) adjustment that leads to it never being activated ever again. It can be fixed with a "Leaky ReLU" activation, well explained in that article.

For example, I produced a simple MLP (3-layer) network with ReLU output which failed. I provided data it could not possibly fail on, and it still failed. I turned the learning rate way down, and it failed more slowly. It always converged to predicting each class with equal probability. It was all fixed by using a Leaky ReLU instead of standard ReLU.

Solution 2

If we are talking about classification tasks, then you should shuffle examples before training your net. I mean, don't feed your net with thousands examples of class #1, after thousands examples of class #2, etc... If you do that, your net most probably wouldn't converge, but would tend to predict last trained class.

Solution 3

I had faced this problem while implementing my own back prop neural network. I tried the following:

  • Implemented momentum (and kept the value at 0.5)
  • Kept the learning rate at 0.1
  • Charted the error, weights, input as well as output of each and every neuron, Seeing the data as a graph is more helpful in figuring out what is going wrong
  • Tried out different activation function (all sigmoid). But this did not help me much.
  • Initialized all weights to random values between -0.5 and 0.5 (My network's output was in the range -1 and 1)
  • I did not try this but Gradient Checking can be helpful as well

Solution 4

If the problem is only convergence (not the actual "well trained network", which is way to broad problem for SO) then the only thing that can be the problem once the code is ok is the training method parameters. If one use naive backpropagation, then these parameters are learning rate and momentum. Nothing else matters, as for any initialization, and any architecture, correctly implemented neural network should converge for a good choice of these two parameters (in fact, for momentum=0 it should converge to some solution too, for a small enough learning rate).

In particular - there is a good heuristic approach called "resillient backprop" which is in fact parameterless appraoch, which should (almost) always converge (assuming correct implementation).

Solution 5

after you've tried different meta parameters (optimization / architecture), the most probable place to look at is - THE DATA

as for myself - to minimize fiddling with meta parameters, i keep my optimizer automated - Adam is by opt-of-choice.

there are some rules of thumb regarding application vs architecture... but its really best to crunch those on your own.

to the point: in my experience, after you've debugged the net (the easy debugging), and still don't converge or get to an undesired local minima, the usual suspect is the data. weather you have contradictory samples or just incorrect ones (outliers), a small amount can make the difference from say 0.6-acc to (after cleaning) 0.9-acc..

a smaller but golden (clean) dataset is much better than a big slightly dirty one... with augmentation you can tweak results even further.

Share:
25,725
Shayan RC
Author by

Shayan RC

Machine Learning Engineer/Data Scientist with 9+ years of experience in building software (7 years of experience in building AI) and a proven track record of prototyping, building and deploying production-ready data solutions across a variety of sectors including Pharma, Finance, and Manufacturing.

Updated on May 07, 2020

Comments

  • Shayan RC
    Shayan RC almost 4 years

    One of the most popular questions regarding Neural Networks seem to be:

    Help!! My Neural Network is not converging!!

    See here, here, here, here and here.

    So after eliminating any error in implementation of the network, What are the most common things one should try??

    I know that the things to try would vary widely depending on network architecture. But tweaking which parameters (learning rate, momentum, initial weights, etc) and implementing what new features (windowed momentum?) were you able to overcome some similar problems while building your own neural net?

    Please give answers which are language agnostic if possible. This question is intended to give some pointers to people stuck with neural nets which are not converging..