Keras Conv2D: filters vs kernel_size

python neural-network keras deep-learning conv-neural-network

15,558

Each convolution layer consists of several convolution channels (aka. depth or filters). In practice, they are a number such as 64, 128, 256, 512 etc. This is equal to number of channels in the output of a convolutional layer. kernel_size, on the other hand, is the size of these convolution filters. In practice, they take values such as 3x3 or 1x1 or 5x5. To abbreviate, they can be written as 1 or 3 or 5 as they are mostly square in practice.

Edit

Following quote should make it more clear.

Discussion on vlfeat

Suppose X is an input with size W x H x D x N (where N is the size of the batch) to a convolutional layer containing filter F (with size FW x FH x FD x K) in a network.

The number of feature channels D is the third dimension of the input X here (for example, this is typically 3 at the first input to the network if the input consists of colour images). The number of filters K is the fourth dimension of F. The two concepts are closely linked because if the number of filters in a layer is K, it produces an output with K feature channels. So the input to the next layer will have K feature channels.

The FW x FH above is filter size you are looking for.

Added

You should be familiar with filters. You can consider each filter to be responsible for extracting some type of feature from a raw image. The CNNs try to learn such filters i.e. the filters parametrized in CNNs are learned during training of CNNs. You apply each filter in a Conv2D to each input channel and combine these to get output channels. So, the number of filters and the number of output channels are the same.

15,558

Author by

Baron Yugovich

Updated on June 06, 2022

Comments

Baron Yugovich almost 2 years

What's the difference between those two? It would also help to explain in the more general context of convolutional networks.

Also, as a side note, what is channels? In other words, please break down the 3 terms for me: channels vs filters vs kernel.
Baron Yugovich almost 6 years

So channels and filters and depth all mean the same thing?
Baron Yugovich almost 6 years

Just to make sure, FW above is a single new variable, it's not like F*W, right?
Dinesh almost 6 years

Yeah, FW is width of filter and FH is height of filter.
Baron Yugovich almost 6 years

So, in the input, the 3 channels have clear meanings, R,G,B values. But in the hidden values, sometimes we have more than 3 channels, what are their meanings? Also, can a value in a single hidden neuron incorporate values from multiple input channels (e.g. both R and B), or there's no "mixing" between the input channels through the convolution process?
Dinesh almost 6 years

I modified above for interpretation of filters.