Pandas convert columns type from list to np.array

26,195

Use apply to convert each element to it's equivalent array:

df['col1'] = df['col1'].apply(lambda x: np.array(x))

type(df['col1'].iloc[0])
numpy.ndarray

Data:

df = pd.DataFrame({'col1': [[1,2,3],[0,0,0]]})
df

Image

Share:
26,195
LeoCella
Author by

LeoCella

I’m a baccalaureate in Computer Science Engineering and since October 2014 I’m attending a graduate course in Computer Science Engineering at Politecnico di Milano; more precisely it focuses on Big Data Analysis and Machine Learning. My main interests are: swimming (played for more than 15 years, in the last two years in a cohesive and competitive group) and brazilian jiu jitsu . As a brazilian I'm also a music-dependent, I'm a self-taught of pandeiro (a required percussion for playing samba) and djembè ( an african drums ).

Updated on July 12, 2022

Comments

  • LeoCella
    LeoCella almost 2 years

    I'm trying to apply a function to a pandas dataframe, such a function required two np.array as input and it fit them using a well defined model.

    The point is that I'm not able to apply this function starting from the selected columns since their "rows" contain list read from a JSON file and not np.array.

    Now, I've tried different solutions:

    #Here is where I discover the problem
    
    train_df['result'] = train_df.apply(my_function(train_df['col1'],train_df['col2']))
    
    #so I've tried to cast the Series before passing them to the function in both these ways:
    
    X_col1_casted = trai_df['col1'].dtype(np.array)
    X_col2_casted = trai_df['col2'].dtype(np.array)
    

    doesn't work.

    X_col1_casted = trai_df['col1'].astype(np.array)
    X_col2_casted = trai_df['col2'].astype(np.array)
    

    doesn't work.

    X_col1_casted = trai_df['col1'].dtype(np.array)
    X_col2_casted = trai_df['col2'].dtype(np.array)
    

    does'nt work.

    What I'm thinking to do now is a long procedure like:

    starting from the uncasted column-series, convert them into list(), iterate on them apply the function to the np.array() single elements, and append the results into a temporary list. Once done I will convert this list into a new column. ( clearly, I don't know if it will work )

    Does anyone of you know how to help me ?

    EDIT: I add one example to be clear:

    The function assume to have as input two np.arrays. Now it has two lists since they are retrieved form a json file. The situation is this one:

    col1        col2    result
    [1,2,3]     [4,5,6]  [5,7,9]
    [0,0,0]     [1,2,3]  [1,2,3]
    

    Clearly the function is not the sum one, but a own function. For a moment assume that this sum can work only starting from arrays and not form lists, what should I do ?

    Thanks in advance