Why do numpy array arr2d[:, :1] and arr2d[:, 0] produce different results?

15,703

Solution 1

1) When you say arr2d[:, 0], you're saying give me the 0th index of all the rows in arr2d (this is another way of saying give me the 0th column).

2) When you say arr2d[:, :1], you're saying give me all the :1 index of all the rows in arr2d. Numpy interprets :1 the same as it would interpret 0:1. Thus, you're saying "for each row, give me the 0th through first index (exclusive) of each row". This turns out to be just the 0th index, but you've explicitly asked for the second dimension to have length one (since 0:1 is only "length" one).

So:

1)

print arr2d[:, 0].shape

Output:

(3L,)

2)

print arr2d[:, 0:1].shape

Output:

(3L, 1L)

I still don't get it, why don't they return the same thing?

Consider:

print arr2d[:, 0:3]
print arr2d[:, 0:3].shape

print arr2d[:, 0:2]
print arr2d[:, 0:2].shape

print arr2d[:, 0:1]
print arr2d[:, 0:1].shape

This outputs:

[[1 2 3]
 [4 5 6]
 [7 8 9]]
(3L, 3L)

[[1 2]
 [4 5]
 [7 8]]
(3L, 2L)

[[1]
 [4]
 [7]]
(3L, 1L)

It would be a bit unexpected and inconsistent for that last shape to be (3L,).

Solution 2

With a list, you have the same behavior you described:

>>> a = [1, 2, 3, 4]
>>> a[0]
1
>>> a[:1]
[1]

The addition with numpy is the introduction of axis which makes it a little less intuitive.

In the first case, you're returning an item at a specific index, in the second case, you're returning a slice of the list.


With numpy, for the former, you're selecting all the items in the first column which returns a one axis array (is one less than the number of axis of the parent as expected with indexing), but in the second case, you're slicing the original array, for which the result still retains the original dimensions of the parent array.

Solution 3

Index ':1' implies:

'The list of items from index 0 to index 0' which is obviously a list of 1 item.

Index '0' implies:

'The item at index 0'.

Extending that to your question should make the results you obtained pretty clear.

arr2d[:, :1] means 'data corresponding to all rows and the list of columns 0 to 0'.

So the result is a list of lists.

arr2d[:, 0] means 'data corresponding to all rows and just the first column'.

So it is just a list.

Share:
15,703
JJJJ
Author by

JJJJ

Updated on June 04, 2022

Comments

  • JJJJ
    JJJJ almost 2 years

    Say:

    arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    

    arr2d[:, :1] gives me

    array([[1],
           [4],
           [7]])
    

    arr2d[:,0] gives me

    array([1, 4, 7])
    

    I thought they would produce exactly same thing.