Split strings in tuples into columns, in Pandas
40,626
Solution 1
And for the other case, assuming it are strings that look like tuples:
In [74]: df['stats'].str[1:-1].str.split(',', expand=True).astype(float)
Out[74]:
0 1 2 3 4
0 -0.009242 0.410000 -0.742016 0.003683 0.002517
1 0.041154 0.318231 0.758717 0.002640 0.010654
2 -0.014435 0.168438 -0.808703 0.000817 0.003166
3 0.034346 0.288731 0.950845 0.000001 0.003373
4 0.009052 0.151031 0.670257 0.012179 0.003022
5 -0.004797 0.171615 -0.552879 0.050032 0.002180
(note: for older versions of pandas (< 0.16.1), you need to use return_type='frame'
instead of the expand keyword)
By the way, if it are tuples and not strings, you can simply do the following:
pd.DataFrame(df['stats'].tolist(), index=df.index)
Solution 2
If you have a sequence of tuples and not strings, and you want them as DataFrame columns, this is the simplest approach:
df = pd.concat([df['Track ID'],pd.DataFrame(df['stats'].values.tolist())], axis=1)
If it is actually strings, you can first convert it to lists like so, then apply the above operation:
dfpart = pd.DataFrame(df['stats'].apply(lambda x: x.strip('()').split(', ')).values.tolist()).astype(float)
df = pd.concat([df['Track ID'], dfpart], axis=1)
Author by
Admin
Updated on January 26, 2021Comments
-
Admin over 3 years
I have the following
DataFrame
, whereTrack ID
is the row index. How can I split the string in thestats
column into 5 columns of numbers?Track ID stats 14.0 (-0.00924175824176, 0.41, -0.742016492568, 0.0036830094242, 0.00251748449963) 28.0 (0.0411538461538, 0.318230769231, 0.758717081514, 0.00264000622468, 0.0106535783677) 42.0 (-0.0144351648352, 0.168438461538, -0.80870348637, 0.000816872566404, 0.00316572586742) 56.0 (0.0343461538462, 0.288730769231, 0.950844962874, 6.1608706775e-07, 0.00337262030771) 70.0 (0.00905164835165, 0.151030769231, 0.670257006716, 0.0121790506745, 0.00302182567957) 84.0 (-0.0047967032967, 0.171615384615, -0.552879463981, 0.0500316517755, 0.00217970256969)
-
Admin about 9 yearsThanks for the input. I'm new to pandas, python, and scripting in general, so am still getting my head around the basics. The data in the
stats
column were indeed tuples. Creating a new DataFrame withdf2 = df['stats'].apply(pd.Series)
has moved me forwards greatly. Thanks again. -
tacaswell almost 8 years
return_type
has been deprecated in favor ofexpand=True
. -
cs95 over 5 yearsPlease don't use
apply(pd.Series)
(or recommend its usage). See When should I ever use pandas apply() in my code? A better solution is to listify the column first.