Pyspark : select specific column with its position

13,152

You can always get the name of the column with df.columns[n] and then select it:

df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])

To select column at position n:

n = 1
df.select(df.columns[n]).show()
+---+                                                                           
|  b|
+---+
|  2|
|  4|
+---+

To select all but column n:

n = 1

You can either use drop:

df.drop(df.columns[n]).show()
+---+
|  a|
+---+
|  1|
|  3|
+---+

Or select with manually constructed column names:

df.select(df.columns[:n] + df.columns[n+1:]).show()
+---+
|  a|
+---+
|  1|
|  3|
+---+
Share:
13,152

Related videos on Youtube

Laurent Cesaro
Author by

Laurent Cesaro

Updated on September 16, 2022

Comments

  • Laurent Cesaro
    Laurent Cesaro about 1 year

    I would like to know how to select a specific column with its number but not with its name in a dataframe ?

    Like this in Pandas:

    df = df.iloc[:,2]
    

    It's possible ?

  • Laurent Cesaro
    Laurent Cesaro over 5 years
    ok, thanks you ! Is it possible to select all but one column with this line of code? Select column 1 to 4 and 6 to 10 ?
  • Psidom
    Psidom over 5 years
    You can use df.drop(df.columns[5]) to drop a column. see the update