Writing MApreduce code for counting number of records

10,769

Solution 1

Your mapper must emit a fixed key ( just use a Text with the value "count") an a fixed value of 1 (same as you see in the wordcount example).

Then simply use a LongSumReducer as your reducer.

The output of your job will be a record with the key "count" and the value isthe number of records you are looking for.

You have the option of (dramatically!) improving the performance by using the same LongSumReducer as a combiner.

Solution 2

  • Your map should emit 1 for each record read
  • your combiner should emit the sum of all the "1"s it got (sub total per map)
  • you reducer should emit the the grand total number of records

Solution 3

Hope I have a better solution than the accepted answer.

Instead of emiting 1 for each record, why not we just increment a counter in map() and emit the incremented counter after each map task in cleanup().

The intermediate read writes can be reduced. And reducer need to only aggregate list of few values.

public class LineCntMapper extends
  Mapper<LongWritable, Text, Text, IntWritable> {

 Text keyEmit = new Text("Total Lines");
 IntWritable valEmit = new IntWritable();
 int partialSum = 0;

 public void map(LongWritable key, Text value, Context context) {
  partialSum++;
 }

 public void cleanup(Context context) {
  valEmit.set(partialSum);

   context.write(keyEmit, valEmit);

 }
}

You can find full working code here

Share:
10,769
chhaya vishwakarma
Author by

chhaya vishwakarma

Working on big data

Updated on June 04, 2022

Comments

  • chhaya vishwakarma
    chhaya vishwakarma almost 2 years

    I want to write a mapreduce code for counting number of records in given CSV file.I am not getting what to do in map and what to do in reduce how should I go about solving this can anyone suggest something?