Partition array into N chunks with Numpy
Solution 1
Try numpy.array_split
.
From the documentation:
>>> x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
Identical to numpy.split
, but won't raise an exception if the groups aren't equal length.
If number of chunks > len(array) you get blank arrays nested inside, to address that - if your split array is saved in a
, then you can remove empty arrays by:
[x for x in a if x.size > 0]
Just save that back in a
if you wish.
Solution 2
Just some examples on usage of array_split
, split
, hsplit
and vsplit
:
n [9]: a = np.random.randint(0,10,[4,4])
In [10]: a
Out[10]:
array([[2, 2, 7, 1],
[5, 0, 3, 1],
[2, 9, 8, 8],
[5, 7, 7, 6]])
Some examples on using array_split
:
If you give an array or list as second argument you basically give the indices (before) which to 'cut'
# split rows into 0|1 2|3
In [4]: np.array_split(a, [1,3])
Out[4]:
[array([[2, 2, 7, 1]]),
array([[5, 0, 3, 1],
[2, 9, 8, 8]]),
array([[5, 7, 7, 6]])]
# split columns into 0| 1 2 3
In [5]: np.array_split(a, [1], axis=1)
Out[5]:
[array([[2],
[5],
[2],
[5]]),
array([[2, 7, 1],
[0, 3, 1],
[9, 8, 8],
[7, 7, 6]])]
An integer as second arg. specifies the number of equal chunks:
In [6]: np.array_split(a, 2, axis=1)
Out[6]:
[array([[2, 2],
[5, 0],
[2, 9],
[5, 7]]),
array([[7, 1],
[3, 1],
[8, 8],
[7, 6]])]
split
works the same but raises an exception if an equal split is not possible
In addition to array_split
you can use shortcuts vsplit
and hsplit
.
vsplit
and hsplit
are pretty much self-explanatry:
In [11]: np.vsplit(a, 2)
Out[11]:
[array([[2, 2, 7, 1],
[5, 0, 3, 1]]),
array([[2, 9, 8, 8],
[5, 7, 7, 6]])]
In [12]: np.hsplit(a, 2)
Out[12]:
[array([[2, 2],
[5, 0],
[2, 9],
[5, 7]]),
array([[7, 1],
[3, 1],
[8, 8],
[7, 6]])]
Solution 3
I believe that you're looking for numpy.split
or possibly numpy.array_split
if the number of sections doesn't need to divide the size of the array properly.
Solution 4
Not quite an answer, but a long comment with nice formatting of code to the other (correct) answers. If you try the following, you will see that what you are getting are views of the original array, not copies, and that was not the case for the accepted answer in the question you link. Be aware of the possible side effects!
>>> x = np.arange(9.0)
>>> a,b,c = np.split(x, 3)
>>> a
array([ 0., 1., 2.])
>>> a[1] = 8
>>> a
array([ 0., 8., 2.])
>>> x
array([ 0., 8., 2., 3., 4., 5., 6., 7., 8.])
>>> def chunks(l, n):
... """ Yield successive n-sized chunks from l.
... """
... for i in xrange(0, len(l), n):
... yield l[i:i+n]
...
>>> l = range(9)
>>> a,b,c = chunks(l, 3)
>>> a
[0, 1, 2]
>>> a[1] = 8
>>> a
[0, 8, 2]
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Related videos on Youtube
Eiyrioü von Kauyf
Name intended to be a vocal exercise :) email: i'll put a tor redirect eventually
Updated on July 04, 2021Comments
-
Eiyrioü von Kauyf almost 3 years
There is this How do you split a list into evenly sized chunks? for splitting an array into chunks. Is there anyway to do this more efficiently for giant arrays using Numpy?
-
Eiyrioü von Kauyf over 10 yearssorry i'm still looking for an efficient answer ;). right now i'm thinking ctypes is the only efficient way.
-
Prashant Kumar over 10 yearsDefine efficient. Give some sample data, your current method, how fast it is, and how fast you need it to be.
-
smci over 3 yearsAre we supposed to interpret the input to this question as a native Python array, or a numpy ndarray? The first sentence seems to imply the former. The second sentence implies it's asking for a comparison between the former and the latter. 2-dimensional only, presumably. And when we say "efficiently... for giant arrays" are we more concerned with scaleability for asymptotically large N, regardless if it's slower for small N?
-
-
Eiyrioü von Kauyf over 11 yearshow can you remove the empty lists though?
-
Eiyrioü von Kauyf over 11 yearssame question as I asked Prashant. How can you get rid of the empty numpy arrays?
-
tzelleke over 11 years+1) that's a good point to consider, you could extend your solution further to handle certain multidim. cases
-
Prashant Kumar over 11 yearsCan you provide a small example?
-
Eiyrioü von Kauyf over 11 yearsmy problem with this is that if chunks > len(array) then you get blank nested arrays ... how do you get rid of that?
-
Eiyrioü von Kauyf over 11 yearsyes at the moment I use that. I was wondering of a nicer way to do that using numpy. esp. with multi-dim :(
-
Eiyrioü von Kauyf over 11 yearsif # chunks > len(array) you get blank arrays nested inside.
-
Prashant Kumar over 11 yearsI simply wouldn't use # chunks > len(array), but I have included a second step which should remove empty arrays. Let me know if this works.
-
Eiyrioü von Kauyf over 11 yearsyes that was what I was using ... but anyway to do that w/ numpy? List comprehensions in python are slow.
-
timgeb over 8 yearsGood examples, thank you. In your
np.array_split(a, [1], axis=1)
example, do you know how to prevent the first array from having every single element nested? -
Zach about 7 yearsDoes
np.array_split
copy the input array? -
Stefan Falk over 6 yearsThis is relevant for larger data. I am using
numpy.array_split
which appears to make copies of the data. Passing that to your multiprocessing pool will make yet another copy of the data... -
Eduardo Pignatelli over 3 yearsIf you are looking for a way to control the split by the size of the chunk you can use:
np.array_split(x, np.arange(chunk_size, len(x), chunk_size))
. -
David Kaftan about 3 years@EiyrioüvonKauyf, to do it with numpy, just limit the number of elements to the length of the array:
np.array_split(x, min(len(x), 3))
where 3 is the default number of groups you want. -
Sreekiran A R over 2 yearsThis helps. Thanks :)