Two methods to normalise array to sum total to 1.0

python numpy normalization checksum

14,061

Both methods modify values into an array whose sum is 1, but they do it differently.

1st method : scaling only

The first step of method 1 scales the array so that the minimum value becomes 1. This step isn't needed, and wouldn't work if values has a 0 element.

>>> import numpy as np
>>> values = np.array([2, 4, 6, 8])
>>> arr1 = values / values.min()
>>> arr1
array([ 1.,  2.,  3.,  4.])

The second step of method 1 scales the array so that its sum becomes 1. By doing so, it overwrites any change done by the first step. You don't need arr1:

>>> arr1 / arr1.sum()
array([ 0.1,  0.2,  0.3,  0.4])
>>> values / values.sum()
array([ 0.1,  0.2,  0.3,  0.4])

2nd method : offset + scaling

The first step of method 2 offsets and scales the array so that the minimum becomes 0 and the maximum becomes 1:

>>> arr2 = (values - values.min()) / (values.max() - values.min())
>>> arr2
array([ 0.        ,  0.33333333,  0.66666667,  1.        ])

The second step of method 2 scales the array so that the sum becomes 1. The offset from step 1 is still applied, but the scaling from step 1 is overwritten. Note that the minimum element is 0:

>>> arr2 / arr2.sum()
array([ 0.        ,  0.16666667,  0.33333333,  0.5       ])

You could get this result directly from values with :

>>> (values - values.min()) / (values - values.min()).sum()
array([ 0.        ,  0.16666667,  0.33333333,  0.5       ])

14,061

Author by

artDeco

Interests: Quant. Data. Code. Music. Film. Design. Architecture.

Updated on June 04, 2022

Comments

artDeco almost 2 years

I am confused by two methods whereby an array is normalised and must sum total to 1.0:

Array to be normalised:

array([ 1.17091033,  1.13843561,  1.240346  ,  1.05438719,  1.05386014,
        1.15475574,  1.16127814,  1.07070739,  0.93670444,  1.20450255,
        1.25644135])

Method 1:

arr = np.array(values / min(values))
array([ 1.25003179,  1.21536267,  1.32415941,  1.12563488,  1.12507221,
        1.23278559,  1.23974873,  1.14305788,  1.00000000,  1.28589392,
        1.34134236])

arr1 = arr / sum(arr) # Sum total to 1.0
array([ 0.09410701,  0.09149699,  0.09968761,  0.08474195,  0.08469959,
        0.09280865,  0.09333286,  0.08605362,  0.07528369,  0.09680684,
        0.1009812 ])

Method 2:

arr = np.array((values - min(values)) / (max(values) - min(values)))
array([ 0.73249564,  0.63092863,  0.94966065,  0.3680612,  0.3664128 ,
        0.68197101,  0.70237028,  0.41910379,  0.0000000,  0.83755771,
        1.00000000])

arr2 = arr / sum(arr) # Sum total to 1.0
array([ 0.10951467,  0.09432949,  0.14198279,  0.05502845,  0.054782  ,
        0.10196079,  0.10501066,  0.06265978,  0.00000000,  0.12522239,
        0.14950897])

Which method is correct? And why?