How does the unpooling and deconvolution work in DeConvNet
Solution 1
1 Unpooling.
In the original paper on unpooling, remaining activations are zeroed.
2 Deconvolution.
A deconvolutional layer is just the transposed of its corresponding conv layer. E.g. if conv layer's shape is [height, width, previous_layer_fms, next_layer_fms]
, than the deconv layer will have the shape [height, width, next_layer_fms, previous_layer_fms]
. The weights of conv and deconv layers are shared! (see this paper for instance)
Solution 2
Unpooling
As etoropov wrote, you can read about unpooling in Visualizing and Understanding Convolutional Networks by Zeiler and Ferguson:
Unpooling: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables. In the deconvnet, the unpooling operation uses these switches to place the reconstructions from the layer above into appropriate locations, preserving the structure of the stimulus. See Fig. 1(bottom) for an illustration of the procedure.
Deconvolution
Deconvolution works like this:
- You add padding around each pixel
- You apply a convolution
For example, in the following illustration the original blue image is padded with zeros (white), the gray convolution filter is applied to get the green output.
Source: What are deconvolutional layers?
VM_AI
Updated on July 26, 2022Comments
-
VM_AI almost 2 years
I have been trying to understand how unpooling and deconvolution works in DeConvNets.
Unpooling
While during the unpooling stage, the activations are restored back to the locations of maximum activation selections, which makes sense, but what about the remaining activations? Do those remaining activations need to be restored as well or interpolated in some way or just filled as zeros in unpooled map.
Deconvolution
After the convolution section (i.e., Convolution layer, Relu, Pooling ), it is common to have more than one feature map output, which would be treated as input channels to successive layers ( Deconv.. ). How could these feature maps be combined together in order to achieve the activation map with same resolution as original input?
-
Alan about 7 yearsanyone interested in this should take a look at this post (for reducing the confusion of different names)
-
John about 7 yearsI understood about deconvolution. What is happend when I use Average pooling instead of Max Pooling before deconvolution? It likes
Input->Conv(stride=1,3x3)->Pooling(stride=2,AVE)->Deconv(kernel=2,stride=2)
. In my testing, the error rate is increase when I use AVE pooling -
jlh almost 7 yearsSounds good, but what if there has never been a pooling operation to begin with (and never a convolution operation)? Like when going from a noise vector to an image output, where there's only the transposed convolution operation. Any ideas on this?
-
Martin Thoma almost 7 yearsIf you don't have pooling, you can't do unpooling. Transposed convolution (aka deconvolution) doesn't care if you had a convolution before or not.
-
jlh almost 7 yearsKeeping remaining activations at zero often cause checkerboard patterns in the generated image. There's an interesting article about this with a potential solution here: distill.pub/2016/deconv-checkerboard