How does the unpooling and deconvolution work in DeConvNet

image-processing machine-learning neural-network deep-learning conv-neural-network

22,222

Solution 1

1 Unpooling.

In the original paper on unpooling, remaining activations are zeroed.

2 Deconvolution.

A deconvolutional layer is just the transposed of its corresponding conv layer. E.g. if conv layer's shape is [height, width, previous_layer_fms, next_layer_fms], than the deconv layer will have the shape [height, width, next_layer_fms, previous_layer_fms]. The weights of conv and deconv layers are shared! (see this paper for instance)

Solution 2

Unpooling

As etoropov wrote, you can read about unpooling in Visualizing and Understanding Convolutional Networks by Zeiler and Ferguson:

Unpooling: In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables. In the deconvnet, the unpooling operation uses these switches to place the reconstructions from the layer above into appropriate locations, preserving the structure of the stimulus. See Fig. 1(bottom) for an illustration of the procedure.

Deconvolution

Deconvolution works like this:

You add padding around each pixel
You apply a convolution

For example, in the following illustration the original blue image is padded with zeros (white), the gray convolution filter is applied to get the green output.

Source: What are deconvolutional layers?

22,222

Author by

VM_AI

Updated on July 26, 2022

Comments

VM_AI almost 2 years

I have been trying to understand how unpooling and deconvolution works in DeConvNets.

Unpooling

While during the unpooling stage, the activations are restored back to the locations of maximum activation selections, which makes sense, but what about the remaining activations? Do those remaining activations need to be restored as well or interpolated in some way or just filled as zeros in unpooled map.

Deconvolution

After the convolution section (i.e., Convolution layer, Relu, Pooling ), it is common to have more than one feature map output, which would be treated as input channels to successive layers ( Deconv.. ). How could these feature maps be combined together in order to achieve the activation map with same resolution as original input?
Alan about 7 years

anyone interested in this should take a look at this post (for reducing the confusion of different names)
John about 7 years

I understood about deconvolution. What is happend when I use Average pooling instead of Max Pooling before deconvolution? It likes Input->Conv(stride=1,3x3)->Pooling(stride=2,AVE)->Deconv(ker‌nel=2,stride=2). In my testing, the error rate is increase when I use AVE pooling
jlh almost 7 years

Sounds good, but what if there has never been a pooling operation to begin with (and never a convolution operation)? Like when going from a noise vector to an image output, where there's only the transposed convolution operation. Any ideas on this?
Martin Thoma almost 7 years

If you don't have pooling, you can't do unpooling. Transposed convolution (aka deconvolution) doesn't care if you had a convolution before or not.
jlh almost 7 years

Keeping remaining activations at zero often cause checkerboard patterns in the generated image. There's an interesting article about this with a potential solution here: distill.pub/2016/deconv-checkerboard