From tuples to multiple columns in pandas
15,304
Solution 1
new_col_list = ['city','state','regions','country']
for n,col in enumerate(new_col_list):
df[col] = df['location'].apply(lambda location: location[n])
df = df.drop('location',axis=1)
Solution 2
If you return a Series of the (split) location, you can merge (join
to merge on index) the resulting DF directly with your value column.
addr = ['city', 'state', 'region', 'country']
df[['value']].join(df.location.apply(lambda loc: Series(loc, index=addr)))
value city state region country
0 100 Richmond Virginia NaN USA
1 200 New York City New York NaN USA
Solution 3
I haven't timed this, but I would suggest this option:
df.loc[:,'city']=df.location.map(lambda x:x[0])
df.loc[:,'state']=df.location.map(lambda x:x[1])
df.loc[:,'regions']=df.location.map(lambda x:x[2])
df.loc[:,'country']=df.location.map(lambda x:x[3])
I'm guessing avoiding explicit for loop might lend itself to a SIMD instruction (certainly numpy looks for that, but perhaps not other libraries)
Related videos on Youtube
Author by
ba_ul
Updated on July 07, 2022Comments
-
ba_ul almost 2 years
How do I convert this dataframe
location value 0 (Richmond, Virginia, nan, USA) 100 1 (New York City, New York, nan, USA) 200
to this:
city state region country value 0 Richmond Virginia nan USA 100 1 New York City New York nan USA 200
Note that the
location
column in the first dataframe contains tuples. I want to create four columns out of thelocation
column. -
ba_ul over 9 yearsThanks. @exp1orer's solution worked, so I didn't have to try yours.
-
exp1orer over 9 yearsThat is nicely done! If I'm not mistaken, it only works if
df
has index that isrange(len(df))
, right? -
meloncholy over 9 years@exp1orer Not quite.
join
is shorthand for merging on index with both frames, so the indices need only be consistent (which it will be here as the apply and col selection don't affect it). I'll edit the answer. -
meloncholy over 9 yearsShould have thought about this earlier, but worth noting that this is much slower than @exp1orer's solution on a largish frame due to all of the Series being created.