What is the difference between register_parameter and register_buffer in PyTorch?
Solution 1
Pytorch doc for register_buffer()
method reads
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s
running_mean
is not a parameter, but is part of the persistent state.
As you already observed, model parameters are learned and updated using SGD during the training process.
However, sometimes there are other quantities that are part of a model's "state" and should be
- saved as part of state_dict
.
- moved to cuda()
or cpu()
with the rest of the model's parameters.
- cast to float
/half
/double
with the rest of the model's parameters.
Registering these "arguments" as the model's buffer
allows pytorch to track them and save them like regular parameters, but prevents pytorch from updating them using SGD mechanism.
An example for a buffer can be found in _BatchNorm
module where the running_mean
, running_var
and num_batches_tracked
are registered as buffers and updated by accumulating statistics of data forwarded through the layer. This is in contrast to weight
and bias
parameters that learns an affine transformation of the data using regular SGD optimization.
Solution 2
Both parameters and buffers you create for a module (nn.Module
).
Say you have a linear layer nn.Linear
. You already have weight
and bias
parameters. But if you need a new parameter you use register_parameter()
to register a new named parameter that is a tensor.
When you register a new parameter it will appear inside the module.parameters()
iterator, but when you register a buffer it will not.
The difference:
Buffers are named tensors that do not update gradients at every step, like parameters. For buffers, you create your custom logic (fully up to you).
The good thing is when you save the model, all params and buffers are saved, and when you move the model to or off the CUDA params and buffers will go as well.
Related videos on Youtube
Comments
-
apostofes almost 2 years
Module's parameters get changed during training, that is, they are what is learnt during training of a neural network, but what is a buffer?
and is it learnt during neural network training?