pyspark : Convert DataFrame to RDD[string]
33,735
PySpark Row
is just a tuple
and can be used as such. All you need here is a simple map
(or flatMap
if you want to flatten the rows as well) with list
:
data.map(list)
or if you expect different types:
data.map(lambda row: [str(c) for c in row])
Author by
Toren
Updated on November 14, 2020Comments
-
Toren over 3 years
I'd like to convert
pyspark.sql.dataframe.DataFrame
topyspark.rdd.RDD[String]
I converted a DataFrame
df
to RDDdata
:data = df.rdd type (data) ## pyspark.rdd.RDD
the new RDD
data
containsRow
first = data.first() type(first) ## pyspark.sql.types.Row data.first() Row(_c0=u'aaa', _c1=u'bbb', _c2=u'ccc', _c3=u'ddd')
I'd like to convert
Row
to list ofString
, like example below:u'aaa',u'bbb',u'ccc',u'ddd'
Thanks
-
Toren about 8 yearsThanks @zero323 with your answers my learning curve going better