How should "BatchNorm" layer be used in caffe?

13,008

Solution 1

If you follow the original paper, the Batch normalization should be followed by Scale and Bias layers (the bias can be included via the Scale, although this makes the Bias parameters inaccessible). use_global_stats should also be changed from training (False) to testing/deployment (True) - which is the default behavior. Note that the first example you give is a prototxt for deployment, so it is correct for it to be set to True.

I'm not sure about the shared parameters.

I made a pull request to improve the documents on the batch normalization, but then closed it because I wanted to modify it. And then, I never got back to it.

Note that I think lr_mult: 0 for "BatchNorm" is no longer required (perhaps not allowed?), although I'm not finding the corresponding PR now.

Solution 2

After each BatchNorm, we have to add a Scale layer in Caffe. The reason is that the Caffe BatchNorm layer only subtracts the mean from the input data and divides by their variance, while does not include the γ and β parameters that respectively scale and shift the normalized distribution 1. Conversely, the Keras BatchNormalization layer includes and applies all of the parameters mentioned above. Using a Scale layer with the parameter “bias_term” set to True in Caffe, provides a safe trick to reproduce the exact behavior of the Keras version. https://www.deepvisionconsulting.com/from-keras-to-caffe/

Share:
13,008

Related videos on Youtube

Shai
Author by

Shai

Computer vision - image and video processing research. Deep learning, PyTorch, Caffe, Python, C++, Matlab and sometimes other quirks... I have made several contributions to BVLC/caffe. First to earn gold badges (May, 2017): First to earn silver badges (June, 2016): First to earn bronze badges (On Oct 29th, 2015):

Updated on January 19, 2020

Comments

  • Shai
    Shai over 4 years

    I am a little confused about how should I use/insert "BatchNorm" layer in my models.
    I see several different approaches, for instance:

    ResNets: "BatchNorm"+"Scale" (no parameter sharing)

    "BatchNorm" layer is followed immediately with "Scale" layer:

    layer {
        bottom: "res2a_branch1"
        top: "res2a_branch1"
        name: "bn2a_branch1"
        type: "BatchNorm"
        batch_norm_param {
            use_global_stats: true
        }
    }
    
    layer {
        bottom: "res2a_branch1"
        top: "res2a_branch1"
        name: "scale2a_branch1"
        type: "Scale"
        scale_param {
            bias_term: true
        }
    }
    

    cifar10 example: only "BatchNorm"

    In the cifar10 example provided with caffe, "BatchNorm" is used without any "Scale" following it:

    layer {
      name: "bn1"
      type: "BatchNorm"
      bottom: "pool1"
      top: "bn1"
      param {
        lr_mult: 0
      }
      param {
        lr_mult: 0
      }
      param {
        lr_mult: 0
      }
    }
    

    cifar10 Different batch_norm_param for TRAIN and TEST

    batch_norm_param: use_global_scale is changed between TRAIN and TEST phase:

    layer {
      name: "bn1"
      type: "BatchNorm"
      bottom: "pool1"
      top: "bn1"
      batch_norm_param {
        use_global_stats: false
      }
      param {
        lr_mult: 0
      }
      param {
        lr_mult: 0
      }
      param {
        lr_mult: 0
      }
      include {
        phase: TRAIN
      }
    }
    layer {
      name: "bn1"
      type: "BatchNorm"
      bottom: "pool1"
      top: "bn1"
      batch_norm_param {
        use_global_stats: true
      }
      param {
        lr_mult: 0
      }
      param {
        lr_mult: 0
      }
      param {
        lr_mult: 0
      }
      include {
        phase: TEST
      }
    }
    

    So what should it be?

    How should one use"BatchNorm" layer in caffe?

    • user3051460
      user3051460 about 7 years
      You means the default value can check at github.com/BVLC/caffe/blob/…? Because I want to check my current caffe is set to zero or not
  • Shai
    Shai over 7 years
    (1) Why oh why didn't you got back to documenting "BatchNorm"?? (2) PR #4704 was meant to simplify lr_mult params in "BatchNorm" definition. IMHO, this only created a mess.
  • Jonathan
    Jonathan over 7 years
    Thanks for the encouragement to get back to it :-). On the surface, I liked not specifying lr_mult (which I found confusing), but as you point out, it did cause a mess.
  • Shai
    Shai over 7 years
    Just found your caffe.help webpage - awesome!! thanks!

Related