Pyspark RDD collect first 163 Rows
10,209
It is not very efficient but you can zipWithIndex
and filter
:
rdd.zipWithIndex().filter(lambda vi: vi[1] < 163).keys()
In practice it make more sense to simply take
and parallelize
:
sc.parallelize(rdd.take(163))
Author by
wheels
Updated on June 05, 2022Comments
-
wheels almost 2 years
Is there a way to get the first 163 rows of an rdd without converting to a df?
I've tried something like
newrdd = rdd.take(163)
, but that returns a list, andrdd.collect()
returns the whole rdd.Is there a way to do this? Or if not is there a way to convert a list into an rdd?