Get mean of 2D slice of a 3D array in numpy
Solution 1
Use a tuple for axis :
>>> a = np.arange(11*5*5).reshape(11,5,5)
>>> a.mean(axis=(1,2))
array([ 12., 37., 62., 87., 112., 137., 162., 187., 212.,
237., 262.])
Edit: This works only with numpy version 1.7+.
Solution 2
You can reshape(11, 25)
and then call mean
only once (faster):
a.reshape(11, 25).mean(axis=1)
Alternatively, you can call np.mean
twice (about 2X slower on my computer):
a.mean(axis=2).mean(axis=1)
Solution 3
Can always use np.einsum:
>>> a = np.arange(11*5*5).reshape(11,5,5)
>>> np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
array([ 12, 37, 62, 87, 112, 137, 162, 187, 212, 237, 262])
Works on higher dimensional arrays (all of these methods would if the axis labels are changed):
>>> a = np.arange(10*11*5*5).reshape(10,11,5,5)
>>> (np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])).shape
(10, 11)
Faster to boot:
a = np.arange(11*5*5).reshape(11,5,5)
%timeit a.reshape(11, 25).mean(axis=1)
10000 loops, best of 3: 21.4 us per loop
%timeit a.mean(axis=(1,2))
10000 loops, best of 3: 19.4 us per loop
%timeit np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
100000 loops, best of 3: 8.26 us per loop
Scales slightly better then the other methods as array size increases.
Using dtype=np.float64
does not change the above timings appreciably, so just to double check:
a = np.arange(110*50*50,dtype=np.float64).reshape(110,50,50)
%timeit a.reshape(110,2500).mean(axis=1)
1000 loops, best of 3: 307 us per loop
%timeit a.mean(axis=(1,2))
1000 loops, best of 3: 308 us per loop
%timeit np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
10000 loops, best of 3: 145 us per loop
Also something that is interesting:
%timeit np.sum(a) #37812362500.0
100000 loops, best of 3: 293 us per loop
%timeit np.einsum('ijk->',a) #37812362500.0
100000 loops, best of 3: 144 us per loop
Related videos on Youtube
danjarvis
I'm a PhD student at the Institute for Complex Systems Simulation at the University of Southampton, UK studying complexity in Remote Sensing. As part of my work I am heavily involved in Remote Sensing and GIS technologies - particularly ENVI/IDL programming and ArcGIS scripting using Python. My academic website shows some examples of my work, and links to some of the software I have written.
Updated on August 22, 2020Comments
-
danjarvis over 3 years
I have a numpy array with a shape of:
(11L, 5L, 5L)
I want to calculate the mean over the 25 elements of each 'slice' of the array [0, :, :], [1, :, :] etc, returning 11 values.
It seems silly, but I can't work out how to do this. I've thought the
mean(axis=x)
function would do this, but I've tried all possible combinations of axis and none of them give me the result I want.I can obviously do this using a for loop and slicing, but surely there is a better way?
-
Jaime over 10 yearsIt works? One would think so for 1.7 and afterwards, but the docs still say only one axis.
-
J. Martinot-Lagarde over 10 yearsDidn't thought about the numpy version, I have 1.7.1 and it works. It's not in the documentation but the changelog is talking about ufuncs: softpedia.com/progChangelog/Numpy-Changelog-103892.html
-
lmjohns3 over 10 yearsCool, didn't know this had been added !
-
lmjohns3 over 10 yearsI think this is the most straightforward answer, though the einsum does seem to be faster.
-
Jaime over 10 yearsI think the speed is coming from your call to
np.einsum
using anint
accumulator, instead of thefloat
ordouble
, not sure, thatnp.mean
uses. That is a risky thing to do with computing statistics, as you can overflow the accumulator and get very wrong results. Givingnp.einsum
adtype=np.float
ordtype=np.double
would both make the calculation more robust, and (I am guessing here) more similar in performance to the standard functions. Butnp.einsum
is still a super cool function, so you get your +1... -
Daniel over 10 years@Jamie. That was my thought as well, but in my intial testing showed that
einsum
was actually faster for any size and dtype. I have updated the post withnp.double
timings. -
Saullo G. P. Castro over 10 years@Ophion... it is weird that
sum()
does not give the same speed thateinsum()
... very well observed... actually the second faster method to calculate the mean would be:timeit a.sum(axis=(1,2))/a.shape[-1]/a.shape[-2]
-
Saullo G. P. Castro over 10 years@Ophion I think you should post a question like "why is
np.einsum()
faster thannp.sum()
?" to open this topic for discussion in more detail... -
Daniel over 10 years@SaulloCastro I was writing a question to that effect just now. Using
a.sum(axis=(1,2)...
is equivalent in time to thea.mean(axis=(1,2))
function. -
Saullo G. P. Castro over 10 years@Ophion I just saw it! Great question!