How to drop multiple column names given in a list from Spark DataFrame?

40,091

Solution 1

You can use the * operator to pass the contents of your list as arguments to drop():

df.drop(*drop_lst)

Solution 2

You can give column name as comma separated list e.g.

df.drop("col1","col11","col21")

Solution 3

This is how drop specified number of consecutive columns in scala:

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice take two parameters star index and end index.

Share:
40,091
GeorgeOfTheRF
Author by

GeorgeOfTheRF

Data Scientist

Updated on October 21, 2021

Comments

  • GeorgeOfTheRF
    GeorgeOfTheRF over 2 years

    I have a dynamic list which is created based on value of n.

    n = 3
    drop_lst = ['a' + str(i) for i in range(n)]
    df.drop(drop_lst)
    

    But the above is not working.

    Note:

    My use case requires a dynamic list.

    If I just do the below without list it works

    df.drop('a0','a1','a2')
    

    How do I make drop function work with list?

    Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using select()?