Hadoop Word count: receive the total number of words that start with the letter "c"

java hadoop mapreduce

13,638

Solution 1

Chris Gerken 's answer is right.

If you are outputing word as your key it will only help you to calculate the count of unique words starting with "c"

Not all total count of "c".

So for that you need to output a unique key from mapper.

 while (itr.hasMoreTokens()) {
            String token = itr.nextToken();
            if(token.startsWith("c")){
                word.set("C_Count");
                output.collect(word, one);
            }

        }

Here is an example using New Api

Driver class

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = new Job(conf, "wordcount");
        FileSystem fs = FileSystem.get(conf);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        if (fs.exists(new Path(args[1])))
            fs.delete(new Path(args[1]), true);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setJarByClass(WordCount.class);     
        job.waitForCompletion(true);
    }

}

Mapper class

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer itr = new StringTokenizer(line);
        while (itr.hasMoreTokens()) {
            String token = itr.nextToken();
            if(token.startsWith("c")){
                word.set("C_Count");
                context.write(word, one);
            }

        }
    }
}

Reducer class

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

Solution 2

Instead of

output.collect(word, one);

in your mapper, try:

output.collect("c-total", one);

Solution 3

Simpler code for mapper:

public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> op, Reporter r)throws IOException
{
    String s = value.toString();
      for (String w : s.split("\\W+"))
       {
       if (w.length()>0)
        {
         if(w.startsWith("C")){
         op.collect(new Text("C-Count"), new IntWritable(1));        
         }
       }
  }
}

13,638

Author by

King11

Updated on June 05, 2022

Comments

King11 almost 2 years

Heres the Hadoop word count java map and reduce source code:

In the map function, I've gotten to where I can output all the word that starts with the letter "c" and also the total number of times that word appears, but what I'm trying to do is just output the total number of words starting with the letter "c" but I'm stuck a little on getting the total number.Any help would be greatly appreciated, Thank you.

Example

My Output of what I'm getting:

could 2

can 3

cat 5

What I'm trying to get:

c-total 10

public static class MapClass extends MapReduceBase
   implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output,
                Reporter reporter) throws IOException {
  String line = value.toString();
  StringTokenizer itr = new StringTokenizer(line);
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    if(word.toString().startsWith("c"){
    output.collect(word, one);
   }
  }
 } 
}


public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,
                   OutputCollector<Text, IntWritable> output,
                   Reporter reporter) throws IOException {
  int sum = 0;
  while (values.hasNext()) {
    sum += values.next().get(); //gets the sum of the words and add them together
  }
  output.collect(key, new IntWritable(sum)); //outputs the word and the number
  }
 }

King11 over 9 years

No, that didn't work, c-total is just a name I put in the example by the way.
Chris Gerken over 9 years

What did it output? It should have worked. If every word that starts with "c" causes a record to be output and if every record has the same key and a value of 1 then the reduce is called once and it adds up all the 1's to give the total count. What's the value of variable one, btw?
King11 over 9 years

Thank you, that worked. I can't believe I missed that. Thank you again
King11 over 9 years

@SreeVeni: C_Count 58953, this is my output, so it worked correctly. Thank you guys. I was stuck on that for a bit.
Admin over 2 years

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.