Merge multiple column values into one column in python pandas

134,017

Solution 1

You can call apply pass axis=1 to apply row-wise, then convert the dtype to str and join:

In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
    lambda x: ','.join(x.dropna().astype(str)),
    axis=1
)
df

Out[153]:
  Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

Here I call dropna to get rid of the NaN, however we need to cast again to int so we don't end up with floats as str.

Solution 2

I propose to use .assign

df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \
  df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \
  df.Column4.astype(str) + ', ' df.Column5.astype(str))

it's simple, maybe long but it worked for me

Solution 3

If you have lot of columns say - 1000 columns in dataframe and you want to merge few columns based on particular column name e.g. -Column2 in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2 inclusive of Column2 as OP asked).

We can get position of column using .get_loc() - as answered here

source_col_loc = df.columns.get_loc('Column2') # column position starts from 0

df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
    lambda x: ",".join(x.astype(str)), axis=1)

df

Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

To remove NaN, use .dropna() or .fillna()

Hope it helps!

Share:
134,017
sequence_hard
Author by

sequence_hard

Recently started working in Bioinformatics, therefore learning python2 :)

Updated on July 05, 2022

Comments

  • sequence_hard
    sequence_hard almost 2 years

    I have a pandas data frame like this:

       Column1  Column2  Column3  Column4  Column5
     0    a        1        2        3        4
     1    a        3        4        5
     2    b        6        7        8
     3    c        7        7        
    

    What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:

      Column1  ColumnA
    0   a      1,2,3,4
    1   a      3,4,5
    2   b      6,7,8
    3   c      7,7
    

    How could I best approach this issue? Any advice would be helpful. Thanks in advance!

  • Amin Salgado
    Amin Salgado about 6 years
    Also, if you are doing it for tonnes of data, it is much faster than lambda
  • Sade
    Sade over 3 years
    For some reason this doesnt work for me. I get duplicates. Therefore row 0 columnA is 1,2,3,4,1,2,3,4
  • Sade
    Sade over 3 years
    It seems like using iloc works for me. Theres no duplicates. df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply( lambda x: ",".join(x.astype(str)), axis=1)
  • Kaustuv
    Kaustuv about 3 years
    A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead