Max-pooling VS Sum-pooling

image image-processing machine-learning neural-network deep-learning

16,506

Convolutional Neural Networks do a great job in dealing with high dimensional data. Restricting the number of weights only to kernels weights makes learning easier due to invariance properties of images or sound. But if you look carefully at what's going on you may notice that the after first convolutional layer the dimension of your data might severely increase if you don't do the tricks like pooling.

Max pooling decreases the dimension of your data simply by taking only the maximum input from a fixed region of your convolutional layer. Sum pooling works in a similiar manner - by taking the sum of inputs instead of it's maximum.

The conceptual difference between these approaches lies in the sort of invariance which they are able to catch. Max pooling is sensitive to existence of some pattern in pooled region. Sum pooling (which is proportional to Mean pooling) measures the mean value of existence of a pattern in a given region.

UPDATE:

The subregions for Sum pooling / Mean pooling are set exactly the same as for Max pooling but instead of using max function you use sum / mean. You can read about here in the paragraph about pooling.

16,506

gsamaras

Yahoo! Machine Learning and Computer Vision team, San Francisco, California. Masters in Data Science. Received Stackoverflow Swag, Good Samaritan SO swag and "10 years Stackoverflow" Swag x2! In Top 10 users of my country.

Updated on July 17, 2022

Comments

gsamaras almost 2 years

I have _partially understood Max-pooling, after reading Convolutional Neural Networks (LeNet):

Another important concept of CNNs is max-pooling, which is a form of non-linear down-sampling. Max-pooling partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum value.

What about Sum-pooling? I couldn't find any easy to understand article.