Keras: rescale=1./255 vs preprocessing_function=preprocess_input - which one to use?

10,507

Solution 1

I had similar questions, and after running the small experiments below, I think you need to always use preprocess_input when using pre trained models, and use rescale when training from scratch.

Obviously when you directly used a pre trained model for inference, you have to use preprocess_input: for example I tried to use resnet50 on the kaggle dogs-vs-cats dataset, with rescale=1./255 it returns the index 111 (nematode, nematode worm, roundworm) as the most likely class for all images, whereas with preprocess_input it mostly returns indices corresponding to dogs and cats as expected.

Then I tried to use resnet50 with include_top=False, frozen weights from imagenet, one GlobalAveragePooling2D layer and one final dense sigmoid layer. I trained it with Adam on 2000 images of the kaggle dogs-vs-cats dataset (and I used 1000 images as validation). Using rescale it does not manage to fit anything after 5 epochs, it always predict the first class (though strangely the training accuracy is 97%, but when I run evaluate_generator`` on the training images, the accuracy is **50%**). But withpreprocess_input, it achieves **98%** accuracy on the validation set. Also note that you do not really need the images to be of the same dimensions as the trained models, for example if I use 150 instead of 224, I still get a **97.5%** accuracy. Without any rescaling orpreprocess_input`, I got a 95% validation accuracy.

I tried the same thing with vgg16, with rescaling it does manage to fit, but to 87% vs 97% using preprocess_input, and 95% without anything.

Then I trained a small conv network from scratch with 10 epochs, without anything or using resnet50 preprocess_input, it does not fit at all, but with rescaling I got a 70% validation accuracy.

Solution 2

First I thought using rescale=1./255 only works when dealing with a pretrained vgg16 model, but I keep seeing examples where it is being used with pre-trained resetnet50, inception etc. as well.

The reason that is done is because you need to NORMALIZE your input. Normally the formula for min-max normalization is

Which is the equivalent of doing

1./255

Since the pixel values of the image will be between 0 and 1

The reason for normalizing the input has to do with numerical stability and convergence (technically you do not need it, but with it, the neural network has a higher chance of converging and the gradient descent/adam algorithm is way more likely to be stable)

As per Does this only make a difference when using pretrained-models i.e. with loaded weights vs training from scratch? No, it is not linked to pretrained models only, it is a common technique when using certain algorithms in machine learning (neural networks being one of them).

If you are interested on REALLY understanding what goes on behind all this and why it is so important to normalize, I strongly recommend you to take the Andrew Ng course on machine learning

Solution 3

If you are using Transfer Learning in a way where you only leveraging the network structure but retraining the entire network (may be starting with leveraged weights), you may choose to setup your own preprocessing strategy. Which means you can scale by diving by 255.0 or use preprocess_input or even a custom implementation of preprocess.

If you are using Transfer Learning where you are not retraining the entire network but replacing the last layer with a few fully connected dense layers, then it is strongly recommended to use the preprocess_input associated with the network you are training. This is because the weights associated with layers which you are not training are accustomed to a specific preprocessing step. For example, if you look at preprocess_input for InceptionResNetV2 and follow the codepath to _preprocess_numpy_input, it doesn't normalize the image in every case, only when mode is "tf" or "pytorch". So if you trained an InceptionResNetV2 and normalized the image by diving by 0, it might not train the classifier the way you intended to.

10,507

AaronDT

hello world.

Updated on June 28, 2022

Comments

AaronDT about 2 years
Background

I find quite a lot of code examples where people are preprocessing their image-data with either using rescale=1./255 or they are using they preprocessing_function setting it to the preprocess_input of the respective model they are using within the ImageDataGenerator. First I thought using rescale=1./255 only works when dealing with a pretrained vgg16 model, but I keep seeing examples where it is being used with pre-trained resetnet50, inception etc. as well.

While the keras-blog (https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) uses this approach...
```
ImageDataGenerator(rescale=1./255, ...
```
... the Keras docs (https://keras.io/applications/) uses this approach:
```
from keras.applications.vgg19 import preprocess_input
ImageDataGenerator(preprocessing_function=preprocess_input, ...
```
I thought using the respective preprocess_input of the respective model I want to train is always superior to using the rescale=1./255 approach, since it will 100% reflect the preprocessing that has been used during training of the pretrained model.

Question

I need some clarification on when to use rescale=1./255 vs keras build-in preprocess_input of the respective model I want to train on when preprocessing images for transfer-learning. Does this only make a difference when using pretrained-models i.e. with loaded weights vs training from scratch?
AaronDT over 5 years

Thank you juan for you answer, but I am aware why you need to normalize your input data. However, I assumed that the weights of the different models are trained with different preprocessing applied to the data. Hence, to my understanding you want to mimic the same preprocessing when using transfer-learning on pre-trained weights. BUT why would you choose 1./255 when you could use the exact preprocessing procedure that was being used during training of the model? Sorry if it sounded unclear in my question.
Juan Antonio Gomez Moriano over 5 years

The only "mandatory" preprocessing to apply to the image is the normalization, apart from that, you need the images to be of the same dimensions (size), so if the network was trained with 224x244x3, you will need your new images to use that size too. As per if the images for training were rotated or not (or any other technique to augment them), it is inconsequential.
AaronDT over 5 years

So you mean there is basically no difference between using 1./255 vs using the respective preprocess_input of the model at hand? Looking for example at the inception_resnet_v2 implementation (github.com/fchollet/deep-learning-models/blob/master/…) the normalization looks different from 1./255 - there has to be a reason for that or not? Remember, I am also considering cases where I want to use transfer-learning and fine-tuning. I would assume that the pretrained weights work best when one uses the same normalization for fine-tuning as during training
Juan Antonio Gomez Moriano over 5 years

I have used transfer learning with VGG using 1./255 although notice that there are other normalization techniques... Maybe reset uses a different normalization method, that is all. However min-max normalization is quite common with images processing.
AaronDT over 5 years

I highly appreciate your effort but my question remains unanswered. I am only interested to know WHY you would choose 1./255 vs the preprocessing_input variant. I do know that you NEED to use normalization and I know that you CAN use any normalization for it to just "work". But does it mean you SHOULD? In order to get the best possible results during fine-tuning there has to be a prefered method (rescale vs preprocess_input). If both normalization-procedures would yield the exact same results then all the preprocess_input functions for the different models within keras would be superfluous.
Vidya over 2 years

Why does accuracy fall when the images are subjected to preprocess_input function of the pre-trained model such as ResNet50 , compared to no preprocessing ? I am baffled .