Split strings in tuples into columns, in Pandas

40,626

Solution 1

And for the other case, assuming it are strings that look like tuples:

In [74]: df['stats'].str[1:-1].str.split(',', expand=True).astype(float)
Out[74]:
          0         1         2         3         4
0 -0.009242  0.410000 -0.742016  0.003683  0.002517
1  0.041154  0.318231  0.758717  0.002640  0.010654
2 -0.014435  0.168438 -0.808703  0.000817  0.003166
3  0.034346  0.288731  0.950845  0.000001  0.003373
4  0.009052  0.151031  0.670257  0.012179  0.003022
5 -0.004797  0.171615 -0.552879  0.050032  0.002180

(note: for older versions of pandas (< 0.16.1), you need to use return_type='frame' instead of the expand keyword)

By the way, if it are tuples and not strings, you can simply do the following:

pd.DataFrame(df['stats'].tolist(), index=df.index)

Solution 2

If you have a sequence of tuples and not strings, and you want them as DataFrame columns, this is the simplest approach:

df = pd.concat([df['Track ID'],pd.DataFrame(df['stats'].values.tolist())], axis=1)

If it is actually strings, you can first convert it to lists like so, then apply the above operation:

dfpart = pd.DataFrame(df['stats'].apply(lambda x: x.strip('()').split(', ')).values.tolist()).astype(float)
df = pd.concat([df['Track ID'], dfpart], axis=1)
Share:
40,626
Admin
Author by

Admin

Updated on January 26, 2021

Comments

  • Admin
    Admin over 3 years

    I have the following DataFrame, where Track ID is the row index. How can I split the string in the stats column into 5 columns of numbers?

    Track ID    stats
    14.0    (-0.00924175824176, 0.41, -0.742016492568, 0.0036830094242, 0.00251748449963)
    28.0    (0.0411538461538, 0.318230769231, 0.758717081514, 0.00264000622468, 0.0106535783677)
    42.0    (-0.0144351648352, 0.168438461538, -0.80870348637, 0.000816872566404, 0.00316572586742)
    56.0    (0.0343461538462, 0.288730769231, 0.950844962874, 6.1608706775e-07, 0.00337262030771)
    70.0    (0.00905164835165, 0.151030769231, 0.670257006716, 0.0121790506745, 0.00302182567957)
    84.0    (-0.0047967032967, 0.171615384615, -0.552879463981, 0.0500316517755, 0.00217970256969)
    
  • Admin
    Admin about 9 years
    Thanks for the input. I'm new to pandas, python, and scripting in general, so am still getting my head around the basics. The data in the stats column were indeed tuples. Creating a new DataFrame with df2 = df['stats'].apply(pd.Series) has moved me forwards greatly. Thanks again.
  • tacaswell
    tacaswell almost 8 years
    return_type has been deprecated in favor of expand=True.
  • cs95
    cs95 over 5 years
    Please don't use apply(pd.Series) (or recommend its usage). See When should I ever use pandas apply() in my code? A better solution is to listify the column first.