How to use map() to convert (key,values) pair to values only in Pyspark

python-2.7 mapreduce apache-spark pyspark

25,126

Solution 1

Finally I got the answer, its like this -->

wordCounts
.map(lambda x:x[1])
.reduce(lambda x,y:x + y)

Solution 2

Yes, your lambda function in .map takes in a tuple x as an argument and returns the 2nd element via x[1](the 2nd index in the tuple). You could also take in the tuple as an argument and return the 2nd element as follows:

.map(lambda (x,y) : y)

25,126

Author by

user2090166

Updated on July 09, 2022

Comments

user2090166 almost 2 years

I have this code in PySpark to .

wordsList = ['cat', 'elephant', 'rat', 'rat', 'cat']
wordsRDD = sc.parallelize(wordsList, 4)


wordCounts = wordPairs.reduceByKey(lambda x,y:x+y)
print wordCounts.collect()

#PRINTS-->  [('rat', 2), ('elephant', 1), ('cat', 2)]

from operator import add
totalCount = (wordCounts
              .map(<< FILL IN >>)
              .reduce(<< FILL IN >>))

#SHOULD PRINT 5

#(wordCounts.values().sum()) // does the trick but I want to this with map() and reduce()


I need to use a reduce() action to sum the counts in wordCounts and then divide by the number of unique words.

* But first I need to map() the pair RDD wordCounts, which consists of (key, value) pairs, to an RDD of values.

This is where I am stuck. I tried something like this below, but none of them work:

.map(lambda x:x.values())
.reduce(lambda x:sum(x)))

AND,

.map(lambda d:d[k] for k in d)
.reduce(lambda x:sum(x)))

Any help in this would be highly appreciated!

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Exception: Java gateway process exited before sending the driver its port number while creating a Spark Session in Python

PySpark: calculate mean, standard deviation and those values around the mean in one step

RDD to DataFrame in pyspark (columns from rdd's first element)

How to derive Percentile using Spark Data frame and GroupBy in python

Python spark extract characters from dataframe

PySpark — UnicodeEncodeError: 'ascii' codec can't encode character

Pyspark filter dataframe by columns of another dataframe

Reduce a key-value pair into a key-list pair with Apache Spark

Pyspark - Aggregation on multiple columns

Can't connect to Mongo DB via Spark