Pandas - combine column values into a list in a new column

67,735

try this :

t['combined']= t.values.tolist()

t
Out[50]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]
Share:
67,735
clg4
Author by

clg4

Updated on July 08, 2022

Comments

  • clg4
    clg4 almost 2 years

    I have a Python Pandas dataframe df:

    d=[['hello',1,'GOOD','long.kw'],
       [1.2,'chipotle',np.nan,'bingo'],
       ['various',np.nan,3000,123.456]]                                                    
    t=pd.DataFrame(data=d, columns=['A','B','C','D']) 
    

    which looks like this:

    print(t)
             A         B     C        D
    0    hello         1  GOOD  long.kw
    1      1.2  chipotle   NaN    bingo
    2  various       NaN  3000  123.456
    

    I am trying to create a new column which is a list of the values in A, B, C, and D. So it would look like this:

    t['combined']                                             
    
    Out[125]: 
    0        [hello, 1, GOOD, long.kw]
    1        [1.2, chipotle, nan, bingo]
    2        [various, nan, 3000, 123.456]
    Name: combined, dtype: object
    

    I am trying this code:

    t['combined'] = t.apply(lambda x: list([x['A'],
                                            x['B'],
                                            x['C'],
                                            x['D']]),axis=1)    
    

    Which returns this error:

    ValueError: Wrong number of items passed 4, placement implies 1 
    

    What is puzzling to me is if remove one of the columns that I want to put in the list (or add another column to the dataframe that I DON'T add to the list), my code works.

    For instance, run this code:

    t['combined'] = t.apply(lambda x: list([x['A'],
                                            x['B'],
                                            x['D']]),axis=1)      
    

    Returns this which is perfect if I only wanted the 3 columns:

    print(t)
             A         B     C        D                 combined
    0    hello         1  GOOD  long.kw      [hello, 1, long.kw]
    1      1.2  chipotle   NaN    bingo   [1.2, chipotle, bingo]
    2  various       NaN  3000  123.456  [various, nan, 123.456]
    

    I am at a complete loss as to why requesting the 'combined' list be made of all columns in the dataframe would create an error, but selecting all but 1 column to create the 'combined' list and the list is created as expected.