Is there a momentum option for Adam optimizer in Keras?
Short answer: no, neither in Keras nor in Tensorflow [EDIT: see UPDATE at the end]
Long answer: as already mentioned in the comments, Adam already incorporates something like momentum. Here is some relevant corroboration:
From the highly recommended An overview of gradient descent optimization algorithms (available also as a paper):
In addition to storing an exponentially decaying average of past squared gradients u[t] like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients m[t], similar to momentum
From Stanford CS231n: CNNs for Visual Recognition:
Adam is a recently proposed update that looks a bit like RMSProp with momentum
Notice that some frameworks actually include a momentum
parameter for Adam, but this is actually the beta1
parameter; here is CNTK:
momentum (float, list, output of
momentum_schedule()
) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.
That said, there is an ICLR 2016 paper titled Incorporating Nesterov momentum into Adam, along with an implementation skeleton in Tensorflow by the author - cannot offer any opinion on this, though.
UPDATE: Keras indeed includes now an optimizer called Nadam
, based on the ICLR 2016 paper mentioned above; from the docs:
Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.
It is also included in Tensorflow as a contributed module NadamOptimizer
.
Tuan Do
Updated on June 17, 2022Comments
-
Tuan Do almost 2 years
The question says it all. Since Adam is performing good with most of the datasets, I wanna try momentum tuning for Adam optimizer. So far I only find momentum option for SGD in Keras