From tuples to multiple columns in pandas

15,304

Solution 1

new_col_list = ['city','state','regions','country']
for n,col in enumerate(new_col_list):
    df[col] = df['location'].apply(lambda location: location[n])

df = df.drop('location',axis=1)

Solution 2

If you return a Series of the (split) location, you can merge (join to merge on index) the resulting DF directly with your value column.

addr = ['city', 'state', 'region', 'country']
df[['value']].join(df.location.apply(lambda loc: Series(loc, index=addr)))

   value           city     state  region country
0    100       Richmond  Virginia     NaN     USA
1    200  New York City  New York     NaN     USA

Solution 3

I haven't timed this, but I would suggest this option:

df.loc[:,'city']=df.location.map(lambda x:x[0])
df.loc[:,'state']=df.location.map(lambda x:x[1])
df.loc[:,'regions']=df.location.map(lambda x:x[2])
df.loc[:,'country']=df.location.map(lambda x:x[3])

I'm guessing avoiding explicit for loop might lend itself to a SIMD instruction (certainly numpy looks for that, but perhaps not other libraries)

Share:
15,304

Related videos on Youtube

ba_ul
Author by

ba_ul

Updated on July 07, 2022

Comments

  • ba_ul
    ba_ul almost 2 years

    How do I convert this dataframe

                                              location  value                       
    0                   (Richmond, Virginia, nan, USA)    100                       
    1              (New York City, New York, nan, USA)    200                       
    
    

    to this:

        city            state       region    country   value
    0   Richmond        Virginia    nan       USA       100
    1   New York City   New York    nan       USA       200
    
    

    Note that the location column in the first dataframe contains tuples. I want to create four columns out of the location column.

  • ba_ul
    ba_ul over 9 years
    Thanks. @exp1orer's solution worked, so I didn't have to try yours.
  • exp1orer
    exp1orer over 9 years
    That is nicely done! If I'm not mistaken, it only works if df has index that is range(len(df)), right?
  • meloncholy
    meloncholy over 9 years
    @exp1orer Not quite. join is shorthand for merging on index with both frames, so the indices need only be consistent (which it will be here as the apply and col selection don't affect it). I'll edit the answer.
  • meloncholy
    meloncholy over 9 years
    Should have thought about this earlier, but worth noting that this is much slower than @exp1orer's solution on a largish frame due to all of the Series being created.