Split strings in tuples into columns, in Pandas

python pandas split

40,626

Solution 1

And for the other case, assuming it are strings that look like tuples:

In [74]: df['stats'].str[1:-1].str.split(',', expand=True).astype(float)
Out[74]:
          0         1         2         3         4
0 -0.009242  0.410000 -0.742016  0.003683  0.002517
1  0.041154  0.318231  0.758717  0.002640  0.010654
2 -0.014435  0.168438 -0.808703  0.000817  0.003166
3  0.034346  0.288731  0.950845  0.000001  0.003373
4  0.009052  0.151031  0.670257  0.012179  0.003022
5 -0.004797  0.171615 -0.552879  0.050032  0.002180

(note: for older versions of pandas (< 0.16.1), you need to use return_type='frame' instead of the expand keyword)

By the way, if it are tuples and not strings, you can simply do the following:

pd.DataFrame(df['stats'].tolist(), index=df.index)

Solution 2

If you have a sequence of tuples and not strings, and you want them as DataFrame columns, this is the simplest approach:

df = pd.concat([df['Track ID'],pd.DataFrame(df['stats'].values.tolist())], axis=1)

If it is actually strings, you can first convert it to lists like so, then apply the above operation:

dfpart = pd.DataFrame(df['stats'].apply(lambda x: x.strip('()').split(', ')).values.tolist()).astype(float)
df = pd.concat([df['Track ID'], dfpart], axis=1)

40,626

Author by

Admin

Updated on January 26, 2021

Comments

Admin over 3 years

I have the following DataFrame, where Track ID is the row index. How can I split the string in the stats column into 5 columns of numbers?

Track ID    stats
14.0    (-0.00924175824176, 0.41, -0.742016492568, 0.0036830094242, 0.00251748449963)
28.0    (0.0411538461538, 0.318230769231, 0.758717081514, 0.00264000622468, 0.0106535783677)
42.0    (-0.0144351648352, 0.168438461538, -0.80870348637, 0.000816872566404, 0.00316572586742)
56.0    (0.0343461538462, 0.288730769231, 0.950844962874, 6.1608706775e-07, 0.00337262030771)
70.0    (0.00905164835165, 0.151030769231, 0.670257006716, 0.0121790506745, 0.00302182567957)
84.0    (-0.0047967032967, 0.171615384615, -0.552879463981, 0.0500316517755, 0.00217970256969)

Admin about 9 years

Thanks for the input. I'm new to pandas, python, and scripting in general, so am still getting my head around the basics. The data in the stats column were indeed tuples. Creating a new DataFrame with df2 = df['stats'].apply(pd.Series) has moved me forwards greatly. Thanks again.
tacaswell almost 8 years

return_type has been deprecated in favor of expand=True.
cs95 over 5 years

Please don't use apply(pd.Series) (or recommend its usage). See When should I ever use pandas apply() in my code? A better solution is to listify the column first.