Two methods to normalise array to sum total to 1.0

14,061

Both methods modify values into an array whose sum is 1, but they do it differently.

1st method : scaling only

The first step of method 1 scales the array so that the minimum value becomes 1. This step isn't needed, and wouldn't work if values has a 0 element.

>>> import numpy as np
>>> values = np.array([2, 4, 6, 8])
>>> arr1 = values / values.min()
>>> arr1
array([ 1.,  2.,  3.,  4.])

The second step of method 1 scales the array so that its sum becomes 1. By doing so, it overwrites any change done by the first step. You don't need arr1:

>>> arr1 / arr1.sum()
array([ 0.1,  0.2,  0.3,  0.4])
>>> values / values.sum()
array([ 0.1,  0.2,  0.3,  0.4])

2nd method : offset + scaling

The first step of method 2 offsets and scales the array so that the minimum becomes 0 and the maximum becomes 1:

>>> arr2 = (values - values.min()) / (values.max() - values.min())
>>> arr2
array([ 0.        ,  0.33333333,  0.66666667,  1.        ])

The second step of method 2 scales the array so that the sum becomes 1. The offset from step 1 is still applied, but the scaling from step 1 is overwritten. Note that the minimum element is 0:

>>> arr2 / arr2.sum()
array([ 0.        ,  0.16666667,  0.33333333,  0.5       ])

You could get this result directly from values with :

>>> (values - values.min()) / (values - values.min()).sum()
array([ 0.        ,  0.16666667,  0.33333333,  0.5       ])
Share:
14,061
artDeco
Author by

artDeco

Interests: Quant. Data. Code. Music. Film. Design. Architecture.

Updated on June 04, 2022

Comments

  • artDeco
    artDeco almost 2 years

    I am confused by two methods whereby an array is normalised and must sum total to 1.0:

    Array to be normalised:

    array([ 1.17091033,  1.13843561,  1.240346  ,  1.05438719,  1.05386014,
            1.15475574,  1.16127814,  1.07070739,  0.93670444,  1.20450255,
            1.25644135])
    

    Method 1:

    arr = np.array(values / min(values))
    array([ 1.25003179,  1.21536267,  1.32415941,  1.12563488,  1.12507221,
            1.23278559,  1.23974873,  1.14305788,  1.00000000,  1.28589392,
            1.34134236])
    
    arr1 = arr / sum(arr) # Sum total to 1.0
    array([ 0.09410701,  0.09149699,  0.09968761,  0.08474195,  0.08469959,
            0.09280865,  0.09333286,  0.08605362,  0.07528369,  0.09680684,
            0.1009812 ])
    

    Method 2:

    arr = np.array((values - min(values)) / (max(values) - min(values)))
    array([ 0.73249564,  0.63092863,  0.94966065,  0.3680612,  0.3664128 ,
            0.68197101,  0.70237028,  0.41910379,  0.0000000,  0.83755771,
            1.00000000])
    
    arr2 = arr / sum(arr) # Sum total to 1.0
    array([ 0.10951467,  0.09432949,  0.14198279,  0.05502845,  0.054782  ,
            0.10196079,  0.10501066,  0.06265978,  0.00000000,  0.12522239,
            0.14950897])
    

    Which method is correct? And why?