Pyspark : select specific column with its position
13,152
You can always get the name of the column with df.columns[n]
and then select
it:
df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
To select column at position n
:
n = 1
df.select(df.columns[n]).show()
+---+
| b|
+---+
| 2|
| 4|
+---+
To select all but column n
:
n = 1
You can either use drop
:
df.drop(df.columns[n]).show()
+---+
| a|
+---+
| 1|
| 3|
+---+
Or select with manually constructed column names:
df.select(df.columns[:n] + df.columns[n+1:]).show()
+---+
| a|
+---+
| 1|
| 3|
+---+
Related videos on Youtube
Author by
Laurent Cesaro
Updated on September 16, 2022Comments
-
Laurent Cesaro about 1 year
I would like to know how to select a specific column with its number but not with its name in a dataframe ?
Like this in Pandas:
df = df.iloc[:,2]
It's possible ?
-
Laurent Cesaro over 5 yearsok, thanks you ! Is it possible to select all but one column with this line of code? Select column 1 to 4 and 6 to 10 ?
-
Psidom over 5 yearsYou can use
df.drop(df.columns[5])
to drop a column. see the update