Convert Pandas dtype of dataframe

13,331

save it just as the values, not the objects. per this post How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

user.posts = user.posts.astype('float')
user.views = user.views.astype('float')
user.kudos = user.kudos.astype('float')

Y = user[['posts','views','kudos']].values
Share:
13,331
conr404
Author by

conr404

Updated on June 04, 2022

Comments

  • conr404
    conr404 almost 2 years

    I have a Pandas dataframe which is stored as an 'object', but I need to change the dataframe structure to an 'int' as the 'object' dtype will not process in the kmeans() function of numpy library

    I have managed to convert each column of the dataframe into an float64,based on this example Pandas: change data type of columns but I can't change the whole thing into anything else.

     #create subset of user variables
     user.posts = user.posts.astype('int')
     user.views = user.views.astype('int')
     user.kudos = user.kudos.astype('int')
    
     Y = user[['posts','views','kudos']]
     #convert dataframe into float
     X.convert_objects(convert_numeric=True).dtypes
    
    Out[205]:
     posts    float64
     views    float64
     kudos    float64
     dtype: object
    

    This then causes issues when I try and run

    K = range(1,10)
    
    # scipy.cluster.vq.kmeans
    KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10
    

    I get the error

      --->KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10
      ^
    
      AttributeError: 'DataFrame' object has no attribute 'dtype'
    

    What is the issue kmeans is having with either the K or X dataframe, and how can it be resolved? Thanks