Numpy stack with unequal shapes

10,683

Solution 1

Numpy arrays have to be rectangular, so what you are trying to get is not possible with a numpy array.

You need a different data structure. Which one is suitable depends on what you want to do with that data.

Solution 2

The function np.stack joins multiple arrays along a new axis, not an existing one. See:

>>> import numpy as np
>>> arr = np.array(range(10)).reshape((5,2))
>>> print arr
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
>>> t1 = np.array([arr[2:4], arr[3:5]])
>>> print t1.shape
(2, 2, 2)

It's not creating a new array of shape (4,2) which I think you're intending. Look at np.concatenate for that.

Note if you really want to use stack, the docs require all input arrays be the same shape:

Parameters: arrays : sequence of array_like Each array must have the same shape.

So what you're doing is going to have undefined behavior.

EDIT: I read too quickly. You are trying to add an axis. Still, you can't pass uneven shapes to stack. You would have to pad them all the the same shape. Example:

arr = np.array(range(10)).reshape((5,2))
print arr
arr_p1 = np.zeros(arr[0:3].shape)
arr_p1_src = arr[0:2]
arr_p1[:arr_p1_src.shape[0],:arr_p1_src.shape[1]] = arr_p1_src
t2 = np.array([arr_p1, arr[0:3]])
print t2

Output:

[[[ 0.  1.]
  [ 2.  3.]
  [ 0.  0.]]

 [[ 0.  1.]
  [ 2.  3.]
  [ 4.  5.]]]

Solution 3

I've made a function that works for this problem, assuming that you are willing to pad to make the shape rectangular, and you have arbitrarily higher multidimensional arrays. It could probably be optimised further, but it's not too bad.

import numpy as np
def stack_uneven(arrays, fill_value=0.):
    '''
    Fits arrays into a single numpy array, even if they are
    different sizes. `fill_value` is the default value.

    Args:
            arrays: list of np arrays of various sizes
                (must be same rank, but not necessarily same size)
            fill_value (float, optional):

    Returns:
            np.ndarray
    '''
    sizes = [a.shape for a in arrays]
    max_sizes = np.max(list(zip(*sizes)), -1)
    # The resultant array has stacked on the first dimension
    result = np.full((len(arrays),) + tuple(max_sizes), fill_value)
    for i, a in enumerate(arrays):
      # The shape of this array `a`, turned into slices
      slices = tuple(slice(0,s) for s in sizes[i])
      # Overwrite a block slice of `result` with this array `a`
      result[i][slices] = a
    return result

The only caveat to using this is that the input must able to be treated a sequence of numpy arrays. So for your example of

arr = np.array([[0, 1],
                [2, 3],
                [4, 5],
                [6, 7],
                [8, 9]])
stack_uneven([arr[:2], arr[:3]], 0)

This would give you

array([[[0, 1],
    [2, 3],
    [0, 0]],

   [[0, 1],
    [2, 3],
    [4, 5]]])

But this works equally for higher dimensional things, like:

arr = [np.ones([3, 2, 2]), np.ones([2, 3, 2]), np.ones([2, 2, 3])]
Share:
10,683
Brad Solomon
Author by

Brad Solomon

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." Stack that I'm most comfortable with: python javascript bash git postgresql redis Where I'm spending my free time: go cython c++ scala

Updated on June 14, 2022

Comments

  • Brad Solomon
    Brad Solomon almost 2 years

    I've noticed that the solution to combining 2D arrays to 3D arrays through np.stack, np.dstack, or simply passing a list of arrays only works when the arrays have same .shape[0].

    For instance, say I have:

    print(arr)
    [[0 1]
     [2 3]
     [4 5]
     [6 7]
     [8 9]]
    

    it easy easy to get to:

    print(np.array([arr[2:4], arr[3:5]])) # same shape
    [[[4 5]
      [6 7]]
    
     [[6 7]
      [8 9]]]
    

    However, if I pass a list of arrays of unequal length, I get:

    print(np.array([arr[:2], arr[:3]]))
    [array([[0, 1],
           [2, 3]])
     array([[0, 1],
           [2, 3],
           [4, 5]])]
    

    How can I get to simply:

    [[[0, 1]
      [2, 3]]
     [[0, 1]
      [2, 3]
      [4, 5]]]
    

    What I've tried: a number of other Array manipulation routines.

    Note: ultimately want to do this for more than 2 arrays, so np.append is probably not ideal.