resize with averaging or rebin a numpy 2d array

32,121

Solution 1

Here's an example based on the answer you've linked (for clarity):

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

As a function:

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

Solution 2

J.F. Sebastian has a great answer for 2D binning. Here is a version of his "rebin" function that works for N dimensions:

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

Solution 3

Here's a way of doing what you ask using matrix multiplication that doesn't require the new array dimensions to divide the old.

First we generate a row compressor matrix and a column compressor matrix (I'm sure there's a cleaner way of doing this, maybe even using numpy operations alone):

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

... so, for instance, get_row_compressor(5, 3) gives you:

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

and get_column_compressor(3, 2) gives you:

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

Then simply premultiply by the row compressor and postmultiply by the column compressor to get the compressed matrix:

def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

Using this technique,

compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

yields:

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

Solution 4

I was trying to downscale a raster -- take a roughly 6000 by 2000 size raster and turn it into an arbitrarily sized smaller raster that averaged the values properly across the previous bins sizes. I found a solution using SciPy, but then I couldn't get SciPy to install on the shared hosting service I was using, so I just wrote this function instead. There is likely a better ways to do this that doesn't involve looping through the rows and columns, but this does seem to work.

The nice part about this is that the old number of rows and columns don't have to be divisible by the new number of rows and columns.

def resize_array(a, new_rows, new_cols): 
    '''
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a.
    '''
    rows = len(a)
    cols = len(a[0])
    yscale = float(rows) / new_rows 
    xscale = float(cols) / new_cols

    # first average across the cols to shorten rows    
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols):
        # get the indices of the original array we are going to average across
        the_x_range = (j*xscale, (j+1)*xscale)
        firstx = int(the_x_range[0])
        lastx = int(the_x_range[1])
        # figure out the portion of the first and last index that overlap
        # with the new index, and thus the portion of those cells that 
        # we need to include in our average
        x0_scale = 1 - (the_x_range[0]-int(the_x_range[0]))
        xEnd_scale =  (the_x_range[1]-int(the_x_range[1]))
        # scale_line is a 1d array that corresponds to the portion of each old
        # index in the_x_range that should be included in the new average
        scale_line = np.ones((lastx-firstx+1))
        scale_line[0] = x0_scale
        scale_line[-1] = xEnd_scale
        # Make sure you don't screw up and include an index that is too large
        # for the array. This isn't great, as there could be some floating
        # point errors that mess up this comparison.
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lastx = lastx - 1
        # Now it's linear algebra time. Take the dot product of a slice of
        # the original array and the scale_line
        new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum()

    # Then average across the rows to shorten the cols. Same method as above.
    # It is probably possible to simplify this code, as this is more or less
    # the same procedure as the block of code above, but transposed.
    # Here I'm reusing the variable a. Sorry if that's confusing.
    a = np.zeros((new_rows, new_cols))
    for i in range(new_rows):
        the_y_range = (i*yscale, (i+1)*yscale)
        firsty = int(the_y_range[0])
        lasty = int(the_y_range[1])
        y0_scale = 1 - (the_y_range[0]-int(the_y_range[0]))
        yEnd_scale =  (the_y_range[1]-int(the_y_range[1]))
        scale_line = np.ones((lasty-firsty+1))
        scale_line[0] = y0_scale
        scale_line[-1] = yEnd_scale
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lasty = lasty - 1
        a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a 
Share:
32,121
Andrea Zonca
Author by

Andrea Zonca

Support my open-source work on healpy via Github Sponsors

Updated on July 23, 2022

Comments

  • Andrea Zonca
    Andrea Zonca almost 2 years

    I am trying to reimplement in python an IDL function:

    http://star.pst.qub.ac.uk/idl/REBIN.html

    which downsizes by an integer factor a 2d array by averaging.

    For example:

    >>> a=np.arange(24).reshape((4,6))
    >>> a
    array([[ 0,  1,  2,  3,  4,  5],
           [ 6,  7,  8,  9, 10, 11],
           [12, 13, 14, 15, 16, 17],
           [18, 19, 20, 21, 22, 23]])
    

    I would like to resize it to (2,3) by taking the mean of the relevant samples, the expected output would be:

    >>> b = rebin(a, (2, 3))
    >>> b
    array([[  3.5,   5.5,  7.5],
           [ 15.5, 17.5,  19.5]])
    

    i.e. b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4]) and so on.

    I believe I should reshape to a 4 dimensional array and then take the mean on the correct slice, but could not figure out the algorithm. Would you have any hint?

  • Andrea Zonca
    Andrea Zonca over 12 years
    thanks, I have created gist on github with the implementation of this function, in case somebody else needs it: gist.github.com/1348792, I also suggested on numpy-discussion to add it to numpy but the answer was negative.
  • K.-Michael Aye
    K.-Michael Aye over 11 years
    did they give a reason for the negative answer?
  • Admin
    Admin over 11 years
    I think this is the discussion. Doesn't seem to negative, just more a lack of time or not enough interest.
  • GajananB
    GajananB almost 10 years
    Bear in mind that doing a mean over data which has a NaN will return a NaN. So if you want a mean that ignores any NaN values you will need nanmean() instead. Still a great answer.
  • ru111
    ru111 almost 4 years
    This is brilliant, it works even if the new shape isn't a multiple of the original shape (the problem I had with other solutions).
  • Ferus
    Ferus over 2 years
    Doesn't always work, for example: resize_array(np.random.uniform(size=(12961, 1)), 50, 1) (gives error)