Convert Row into List(String) in PySpark
18,800
Solution 1
With single Row
(why would you even...) it should be:
a = Row(Sentence=u'When, for the first time I realized the meaning of death.')
b = sc.parallelize([a])
and flattened with
b.map(lambda x: x.Sentence)
or
b.flatMap(lambda x: x)
although sc.parallelize(a)
is already in the format you need - because you pass Iterable
, Spark will iterate over all fields in Row
to create RDD
Solution 2
below is the code:
col = 'your_column_name'
val = df.select(col).collect()
val2 = [ ele.__getattr__(col) for ele in val]
Author by
Abhijeet
Updated on July 16, 2022Comments
-
Abhijeet almost 2 years
I have data in Row tuple format -
Row(Sentence=u'When, for the first time I realized the meaning of death.')
I want to convert it into String format like this -
(u'When, for the first time I realized the meaning of death.')
I tried like this (Suppose 'a' is having data in Row tupple)-
b = sc.parallelize(a) b = b.map(lambda line: tuple([str(x) for x in line])) print(b.take(4))
But I am getting result something like this -
[('W', 'h', 'e', 'n', ',', ' ', 'f', 'o', 'r', ' ', 't', 'h', 'e', ' ', 'f', 'i', 'r', 's', 't', ' ', 't', 'i', 'm', 'e', ' ', 'I', ' ', 'r', 'e', 'a', 'l', 'i', 'z', 'e', 'd', ' ', 't', 'h', 'e', ' ', 'm', 'e', 'a', 'n', 'i', 'n', 'g', ' ', 'o', 'f', ' ', 'd', 'e', 'a', 't', 'h', '.')]
Do anybody know what I am doing wrong here?
-
stevenl over 4 yearsThis worked for me with the following adjustment (cleaner):
val2 = [ ele[col] for ele in val]