PySpark DataFrame - Join on multiple columns dynamically
17,851
Solution 1
Why not use a simple comprehension:
firstdf.join(
seconddf,
[col(f) == col(s) for (f, s) in zip(columnsFirstDf, columnsSecondDf)],
"inner"
)
Since you use logical it is enough to provide a list of conditions without &
operator.
Solution 2
@Mohan sorry i dont have reputation to do "add a comment". Having column same on both dataframe,create list with those columns and use in the join
col_list=["id","column1","column2"]
firstdf.join( seconddf, col_list, "inner")
Author by
Pedro Bernardo
Updated on July 29, 2022Comments
-
Pedro Bernardo over 1 year
let's say I have two DataFrames on Spark
firstdf = sqlContext.createDataFrame([{'firstdf-id':1,'firstdf-column1':2,'firstdf-column2':3,'firstdf-column3':4}, \ {'firstdf-id':2,'firstdf-column1':3,'firstdf-column2':4,'firstdf-column3':5}]) seconddf = sqlContext.createDataFrame([{'seconddf-id':1,'seconddf-column1':2,'seconddf-column2':4,'seconddf-column3':5}, \ {'seconddf-id':2,'seconddf-column1':6,'seconddf-column2':7,'seconddf-column3':8}])
Now I want to join them by multiple columns (any number bigger than one)
What I have is an array of columns of the first DataFrame and an array of columns of the second DataFrame, these arrays have the same size, and I want to join by the columns specified in these arrays. For example:
columnsFirstDf = ['firstdf-id', 'firstdf-column1'] columnsSecondDf = ['seconddf-id', 'seconddf-column1']
Since these arrays have variable sizes I can't use this kind of approach:
from pyspark.sql.functions import * firstdf.join(seconddf, \ (col(columnsFirstDf[0]) == col(columnsSecondDf[0])) & (col(columnsFirstDf[1]) == col(columnsSecondDf[1])), \ 'inner' )
Is there any way that I can join on multiple columns dynamically?
-
Mohan over 5 yearswhat to do if the column names are same in both dataframes?
-
Omkar Neogi over 3 yearsIf column names are same in both dataframes, you either alias both the dataframes or alias the individual columns using "as". stackoverflow.com/q/33778664/5986661