compute z-score with the function in scipy and numpy

18,972

Since nobody has added an answer and it seems to be correct, I will post Alex Riley's answer here.

Try this to get the same result for the 2D array.

from scipy import stats

stats.zscore(a, axis=None)
Share:
18,972

Related videos on Youtube

iTS
Author by

iTS

Updated on September 15, 2022

Comments

  • iTS
    iTS over 1 year

    I try to use the stats.zscore() in scipy and have the following results which confuse me.

    Suppose I have an array and I compute the z-score in 2 different ways:

    >>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
    >>> a
    array([[ 1.,  2.],
           [ 3.,  4.]])
    

    First result:

    >>> stats.zscore(a)               
    array([[-1., -1.],
           [ 1.,  1.]])
    

    Second result:

    >>> mean = np.mean(a)
    >>> mean
    2.5
    >>> std = np.std(a)
    >>> std
    1.1180339887498949
    >>> b = (a-mean)/std
    >>> b
    array([[-1.34164079, -0.4472136 ],
           [ 0.4472136 ,  1.34164079]])
    

    The above results are different, but if I use another array,

    >>> c = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,  0.1954, 0.6307, 0.6599,  0.1065,  0.0508])
    >>> c
    array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,  0.1954,  0.6307, 0.6599,  0.1065,  0.0508])
    

    First result:

    >>> stats.zscore(c)
    array([ 1.12724554, -1.2469956 , -0.05542642,  1.09231569,  1.16645923, -0.8558472 ,  0.57858329,  0.67480514, -1.14879659, -1.33234306])
    

    Second result:

    >>> mean = np.mean(c)
    >>> mean
    0.45511999999999986
    >>> std = np.std(c)
    >>> std
    0.30346538451691657
    >>> b = (c-mean)/std
    >>> b
    array([ 1.12724554, -1.2469956 , -0.05542642,  1.09231569,  1.16645923, -0.8558472 ,  0.57858329,  0.67480514, -1.14879659, -1.33234306])
    
        
    

    So when I use another array, the results become the same. Can someone help me understand what I did wrong in this? Thanks!

    • Warren Weckesser
      Warren Weckesser
      @Alex: That looks like an answer. :)
    • Alex Riley
      Alex Riley
      stats.zscore works along axis 0 by default (it does not flatten the entire array like the mean and std functions). It's behaviour is essentially (a - a.mean(axis=0)) / a.std(axis=0).
    • Alex Riley
      Alex Riley
      (*its behaviour - pardon my grammar) - try stats.zscore(a, axis=None) to get the same result as NumPy for the 2D array.