Create pandas dataframe from numpy array

10,837

Solution 1

You need to transpose your numpy array:

df_1 = pd.DataFrame(data.T, columns=columns)

To see why this is necessary, consider the shape of your array:

print(data.shape)

(2, 3)

The second number in the shape tuple, or the number of columns in the array, must be equal to the number of columns in your dataframe.

When we transpose the array, the data and shape of the array are transposed, enabling it to be a passed into a dataframe with two columns:

print(data.T.shape)

(3, 2)

print(data.T)

[[1 1]
 [2 5]
 [2 3]]

Solution 2

DataFrames are inherently created in that order from an array.

Either way, you need to transpose something.

One option would be to specify the index=columns then transpose the whole thing. This will get you the same output.

 columns = ['1','2']
 data = np.array([[1,2,2] , [1,5,3]])
 df_1 = pd.DataFrame(data, index=columns).T
 df_1

Passing in data.T as mentioned above is also perfectly acceptable (assuming the data is an ndarray type).

Solution 3

In the second case, you can use:

df_1 = pd.DataFrame(dict(zip(columns, data)))
Share:
10,837
blue-sky
Author by

blue-sky

scala :: java

Updated on June 23, 2022

Comments

  • blue-sky
    blue-sky almost 2 years

    To create a pandas dataframe from numpy I can use :

    columns = ['1','2']
    data = np.array([[1,2] , [1,5] , [2,3]])
    df_1 = pd.DataFrame(data,columns=columns)
    df_1
    

    If I instead use :

    columns = ['1','2']
    data = np.array([[1,2,2] , [1,5,3]])
    df_1 = pd.DataFrame(data,columns=columns)
    df_1
    

    Where each array is a column of data. But this throws error :

    ValueError: Wrong number of items passed 3, placement implies 2
    

    Is there support in pandas in this data format or must I use the format in example 1 ?