Two methods to normalise array to sum total to 1.0
Both methods modify values
into an array whose sum is 1
, but they do it differently.
1st method : scaling only
The first step of method 1 scales the array so that the minimum value becomes 1
. This step isn't needed, and wouldn't work if values
has a 0
element.
>>> import numpy as np
>>> values = np.array([2, 4, 6, 8])
>>> arr1 = values / values.min()
>>> arr1
array([ 1., 2., 3., 4.])
The second step of method 1 scales the array so that its sum becomes 1
. By doing so, it overwrites any change done by the first step. You don't need arr1
:
>>> arr1 / arr1.sum()
array([ 0.1, 0.2, 0.3, 0.4])
>>> values / values.sum()
array([ 0.1, 0.2, 0.3, 0.4])
2nd method : offset + scaling
The first step of method 2 offsets and scales the array so that the minimum becomes 0
and the maximum becomes 1
:
>>> arr2 = (values - values.min()) / (values.max() - values.min())
>>> arr2
array([ 0. , 0.33333333, 0.66666667, 1. ])
The second step of method 2 scales the array so that the sum becomes 1
. The offset from step 1 is still applied, but the scaling from step 1 is overwritten. Note that the minimum element is 0
:
>>> arr2 / arr2.sum()
array([ 0. , 0.16666667, 0.33333333, 0.5 ])
You could get this result directly from values
with :
>>> (values - values.min()) / (values - values.min()).sum()
array([ 0. , 0.16666667, 0.33333333, 0.5 ])
artDeco
Interests: Quant. Data. Code. Music. Film. Design. Architecture.
Updated on June 04, 2022Comments
-
artDeco almost 2 years
I am confused by two methods whereby an array is normalised and must sum total to 1.0:
Array to be normalised:
array([ 1.17091033, 1.13843561, 1.240346 , 1.05438719, 1.05386014, 1.15475574, 1.16127814, 1.07070739, 0.93670444, 1.20450255, 1.25644135])
Method 1:
arr = np.array(values / min(values)) array([ 1.25003179, 1.21536267, 1.32415941, 1.12563488, 1.12507221, 1.23278559, 1.23974873, 1.14305788, 1.00000000, 1.28589392, 1.34134236]) arr1 = arr / sum(arr) # Sum total to 1.0 array([ 0.09410701, 0.09149699, 0.09968761, 0.08474195, 0.08469959, 0.09280865, 0.09333286, 0.08605362, 0.07528369, 0.09680684, 0.1009812 ])
Method 2:
arr = np.array((values - min(values)) / (max(values) - min(values))) array([ 0.73249564, 0.63092863, 0.94966065, 0.3680612, 0.3664128 , 0.68197101, 0.70237028, 0.41910379, 0.0000000, 0.83755771, 1.00000000]) arr2 = arr / sum(arr) # Sum total to 1.0 array([ 0.10951467, 0.09432949, 0.14198279, 0.05502845, 0.054782 , 0.10196079, 0.10501066, 0.06265978, 0.00000000, 0.12522239, 0.14950897])
Which method is correct? And why?