How to bin a 2D array in numpy?
Solution 1
You can reshape the array to a four dimensional array that reflects the desired block structure, and then sum along both axes within each block. Example:
>>> a = np.arange(24).reshape(4, 6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> a.reshape(2, 2, 2, 3).sum(3).sum(1)
array([[ 24, 42],
[ 96, 114]])
If a
has the shape m, n
, the reshape should have the form
a.reshape(m_bins, m // m_bins, n_bins, n // n_bins)
Solution 2
At first I was also going to suggest that you use np.histogram2d
rather than reinventing the wheel, but then I realized that it would be overkill to use that and would need some hacking still.
If I understand correctly, you just want to sum over submatrices of your input. That's pretty easy to brute force: going over your output submatrix and summing up each subblock of your input:
import numpy as np
def submatsum(data,n,m):
# return a matrix of shape (n,m)
bs = data.shape[0]//n,data.shape[1]//m # blocksize averaged over
return np.reshape(np.array([np.sum(data[k1*bs[0]:(k1+1)*bs[0],k2*bs[1]:(k2+1)*bs[1]]) for k1 in range(n) for k2 in range(m)]),(n,m))
# set up dummy data
N,M = 4,6
data_matrix = np.reshape(np.arange(N*M),(N,M))
# set up size of 2x3-reduced matrix, assume congruity
n,m = N//2,M//3
reduced_matrix = submatsum(data_matrix,n,m)
# check output
print(data_matrix)
print(reduced_matrix)
This prints
print(data_matrix)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
print(reduced_matrix)
[[ 24 42]
[ 96 114]]
which is indeed the result for summing up submatrices of shape (2,3)
.
Note that I'm using //
for integer division to make sure it's python3-compatible, but in case of python2 you can just use /
for division (due to the numbers involved being integers).
Solution 3
Another solution is to have a look at the binArray function on the comments here: Binning a numpy array
To use your example :
data_matrix = numpy.ndarray((500,500),dtype=float)
binned_data = binArray(data_matrix, 0, 10, 10, np.sum)
binned_data = binArray(binned_data, 1, 10, 10, np.sum)
The result sum all square of size 10x10 in data_matrix
(of size 500x500) to obtain a single value per square in binned_data
(of size 50x50).
Hope this help !
Mike T
I make Novus Scan, a document scanning app for teachers and copywriters.
Updated on June 19, 2022Comments
-
Mike T almost 2 years
I'm new to numpy and I have a 2D array of objects that I need to bin into a smaller matrix and then get a count of the number of objects in each bin to make a heatmap. I followed the answer on this thread to create the bins and do the counts for a simple array but I'm not sure how to extend it to 2 dimensions. Here's what I have so far:
data_matrix = numpy.ndarray((500,500),dtype=float) # fill array with values. bins = numpy.linspace(0,50,50) digitized = numpy.digitize(data_matrix, bins) binned_data = numpy.ndarray((50,50)) for i in range(0,len(bins)): for j in range(0,len(bins)): k = len(data_matrix[digitized == i:digitized == j]) # <-not does not work binned_data[i:j] = k
P.S. the
[digitized == i]
notation on an array will return an array of binary values. I cannot find documentation on this notation anywhere. A link would be appreciated. -
Sven Marnach about 8 yearsIf the number of bins along each axis is a divisor of the dimension along the respective axis, you can do this without any Python loops (which will be much faster). The trick is to reshape the array into a four-dimensional array, and then sum along the right axes.
-
Andras Deak -- Слава Україні about 8 years@SvenMarnach you're right, I keep forgetting that trick:) Do you wish to add that as an answer?
-
Sven Marnach about 8 yearsDone. I didn't have time to post an answer yesterday, so I hoped someone else might do.