norm parameters in sklearn.preprocessing.normalize

10,316

Informally speaking, the norm is a generalization of the concept of (vector) length; from the Wikipedia entry:

In linear algebra, functional analysis, and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space.

The L2-norm is the usual Euclidean length, i.e. the square root of the sum of the squared vector elements.

The L1-norm is the sum of the absolute values of the vector elements.

The max-norm (sometimes also called infinity norm) is simply the maximum absolute vector element.

As the docs say, normalization here means making our vectors (i.e. data samples) having unit length, so specifying which length (i.e. which norm) is also required.

You can easily verify the above adapting the examples from the docs:

from sklearn import preprocessing 
import numpy as np

X = [[ 1., -1.,  2.],
     [ 2.,  0.,  0.],
     [ 0.,  1., -1.]]

X_l1 = preprocessing.normalize(X, norm='l1')
X_l1
# array([[ 0.25, -0.25,  0.5 ],
#        [ 1.  ,  0.  ,  0.  ],
#        [ 0.  ,  0.5 , -0.5 ]])

You can verify by simple visual inspection that the absolute values of the elements of X_l1 sum up to 1.

X_l2 = preprocessing.normalize(X, norm='l2')
X_l2
# array([[ 0.40824829, -0.40824829,  0.81649658],
#        [ 1.        ,  0.        ,  0.        ],
#        [ 0.        ,  0.70710678, -0.70710678]])

np.sqrt(np.sum(X_l2**2, axis=1)) # verify that L2-norm is indeed 1
# array([ 1.,  1.,  1.])
Share:
10,316
aerin
Author by

aerin

Senior Research Engineer @Microsoft Answering questions on applied math, general algorithmic questions, nlp, machine learning, deep learning, etc. I write on automata88.medium.com

Updated on June 04, 2022

Comments

  • aerin
    aerin almost 2 years

    In sklearn documentation says "norm" can be either

    norm : ‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)
    The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0).

    The documentation about normalization isn't clearly stating how ‘l1’, ‘l2’, or ‘max’ are calculated.

    Can anyone clear these?

    • sascha
      sascha over 6 years
      Do you know what a norm is? Did you consider creating a simple dataset and try those 3 candidates? Where is the problem? This kind of analysis probably would have been faster than asking for it.