PySpark: How to check if list of string values exists in dataframe and print values to a list
14,097
In general looping through data in pyspark
will not be very efficient. When possible use native pyspark
functions. For your specific question you can use the filter
function that will filter your DataFrame by names in the student list:
df_names.filter(col("name").isin(students)).select("name")
In your example the only return value will be John.
Related videos on Youtube
Comments
-
Techno04335 over 1 year
I have a df NAMES in which if I output via
display(NAMES)
:NAMES John Sarah Michael Sean
I also have a list students,
print(students)
:{John, Alan, Andy}
Question:
Based on this list (students), how can I loop through the df with "NAMES" Column and output to another list the names of students who are in the list and also in the DF.
Expected output of list: "John"
I have tried
list2 = [] for i in NAMES: for g in students: if i == g: list2.append(i)
but i end up with an error, how can i implement this via pyspark?
Thanks.
-
Matt Messersmith over 5 yearsWhy does this have to do with
pyspark
? -
Matt Messersmith over 5 yearsWhat error did you get?
-