Is there a momentum option for Adam optimizer in Keras?

optimization machine-learning neural-network deep-learning keras

12,063

Short answer: no, neither in Keras nor in Tensorflow [EDIT: see UPDATE at the end]

Long answer: as already mentioned in the comments, Adam already incorporates something like momentum. Here is some relevant corroboration:

From the highly recommended An overview of gradient descent optimization algorithms (available also as a paper):

In addition to storing an exponentially decaying average of past squared gradients u[t] like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients m[t], similar to momentum

From Stanford CS231n: CNNs for Visual Recognition:

Adam is a recently proposed update that looks a bit like RMSProp with momentum

Notice that some frameworks actually include a momentum parameter for Adam, but this is actually the beta1 parameter; here is CNTK:

momentum (float, list, output of momentum_schedule()) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.

That said, there is an ICLR 2016 paper titled Incorporating Nesterov momentum into Adam, along with an implementation skeleton in Tensorflow by the author - cannot offer any opinion on this, though.

UPDATE: Keras indeed includes now an optimizer called Nadam, based on the ICLR 2016 paper mentioned above; from the docs:

Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.

It is also included in Tensorflow as a contributed module NadamOptimizer.

12,063

Author by

Tuan Do

Updated on June 17, 2022

Comments

Tuan Do almost 2 years

The question says it all. Since Adam is performing good with most of the datasets, I wanna try momentum tuning for Adam optimizer. So far I only find momentum option for SGD in Keras