Creating a new column in Panda by using lambda function on two existing columns

73,506

You can use function map and select by function np.where more info

print df
#     a     b
#0  aaa  rrrr
#1   bb     k
#2  ccc     e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
#     a     b  c
#0  aaa  rrrr  4
#1   bb     k  2
#2  ccc     e  3

Next solution is with function apply with parameter axis=1:

axis = 1 or ‘columns’: apply function to each row

df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)
Share:
73,506

Related videos on Youtube

piyush sharma
Author by

piyush sharma

Updated on July 09, 2022

Comments

  • piyush sharma
    piyush sharma over 1 year

    I am able to add a new column in Panda by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

    For Example, df has two columns a and b. I want to create a new column c which is equal to the longest length between a and b.

    Some thing like:

    df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )
    

    One approach:

    df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})
    
    df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))
    print df
          a       b   c
    0   dfg      sd NaN
    1     f     dfg NaN
    2   fff     edr NaN
    3  fgrf      df NaN
    4  fghj  fghjky NaN
    
    • Lev Levitsky
      Lev Levitsky almost 8 years
      This will work once you fix the syntax errors. lambda x needs a colon after it, and your expression lacks else (maybe it should go instead of or).
    • piyush sharma
      piyush sharma almost 8 years
      Thanks for the quick response, however it still not work. Here is the code and error message. I will appreciate if you can provide any help. df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']}) df['c'] = df.apply(lambda x: len(x['a']) if len(x['a']) > len(x['b']) else len(x['b'])) KeyError: ('a', u'occurred at index a')
    • Lev Levitsky
      Lev Levitsky almost 8 years
      Please don't put code in comments, edit the question instead.
    • piyush sharma
      piyush sharma almost 8 years
      Sorry this is my first time here. I try to edit my question but still its not coming in a nice formatted way
    • Lev Levitsky
      Lev Levitsky almost 8 years
      In the edit mode, there is a button that opens formatting help. First off, you can select the code and press Ctrl-K, that will indent it by 4 spaces.
  • piyush sharma
    piyush sharma almost 8 years
    Map might works but mainly I am looking for a way to use Lambda with two columns and create a new column if possible
  • jezrael
    jezrael almost 8 years
    Why do you want use lambda?
  • piyush sharma
    piyush sharma almost 8 years
    The reason for using lambda is less typing and for me the code is more readable
  • Fed
    Fed over 3 years
    For future readers, the mistake was thus forgetting the axis = 1 (which caused the KeyError 'a' as we were iterating through the row_indexer [0,1,2,3,4]) instead of df['a'], df['b']. And also Jezraels Solution#2 is a bit neater, since lambda already loops through the rows.