How to implement sort in hadoop?

25,386

You can probably do this (I'm assuming you are using Java here)

From maps emit like this -

context.write(24,1);
context.write(4,3);
context.write(12,4)
context.write(23,5)

So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key.

Hence, either you do this to sort in descending order,

job.setSortComparatorClass(LongWritable.DecreasingComparator.class);

Or, this,

You need to set a custom Descending Sort Comparator, which goes something like this in your job.

public static class DescendingKeyComparator extends WritableComparator {
    protected DescendingKeyComparator() {
        super(Text.class, true);
    }

    @SuppressWarnings("rawtypes")
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        LongWritable key1 = (LongWritable) w1;
        LongWritable key2 = (LongWritable) w2;          
        return -1 * key1.compareTo(key2);
    }
}

The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23

After comment:

If you require a Descending IntWritable Comparable, you can create one and use it like this -

job.setSortComparatorClass(DescendingIntComparable.class);

In case if you are using JobConf, use this to set

jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);

Put the following code below your main() function -

public static void main(String[] args) {
    int exitCode = ToolRunner.run(new YourDriver(), args);
    System.exit(exitCode);
}

//this class is defined outside of main not inside
public static class DescendingIntWritableComparable extends IntWritable {
    /** A decreasing Comparator optimized for IntWritable. */ 
    public static class DecreasingComparator extends Comparator {
        public int compare(WritableComparable a, WritableComparable b) {
            return -super.compare(a, b);
        }
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return -super.compare(b1, s1, l1, b2, s2, l2);
        }
    }
}
Share:
25,386

Related videos on Youtube

csperson
Author by

csperson

Updated on February 27, 2020

Comments

  • csperson
    csperson about 4 years

    My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values.

    key   value
    1     24
    3     4
    4     12
    5     23
    

    output:

    1     24
    5     23
    4     12
    3     4
    

    I am working with massive data and must run the code in a cluster of hadoop machines. How can i do it with mapreduce?

  • csperson
    csperson over 10 years
    If i have 5 computers running the code, does this code work and the final result is absoulutly true? how many reducer do i need?
  • SSaikia_JtheRocker
    SSaikia_JtheRocker over 10 years
    Yes, you can have any number of reducrs. I'm also assuming you know how to write a MapReduce job. Please give it a shot and tell me if it solves your issue. I think it will with repect to the use case you have mentioned. Thank you.
  • csperson
    csperson over 10 years
    I work with jobconf, it doesn't have setSortComparatorClass method.
  • csperson
    csperson over 10 years
    my keys are intwritable.how do i use DescendingKeyComparator class in my code?
  • SSaikia_JtheRocker
    SSaikia_JtheRocker over 10 years
    Try creating one. I have modified my answer, please check and tell me if it helps.
  • csperson
    csperson over 10 years
    The type of class is static.it shows error.i changed it to final.Does this change cause problem?
  • SSaikia_JtheRocker
    SSaikia_JtheRocker over 10 years
    the static class is defined outside the main() not inside, check the modified answer.
  • pk10
    pk10 over 9 years
    But what about a Double value? There is no class to accomplish it?
  • SSaikia_JtheRocker
    SSaikia_JtheRocker over 9 years
  • Abhi
    Abhi about 8 years
    Above process sorts data by keys(24,23,12,4) after map emits keys as values and vice-versa. Can I take the input into my reduce() from the sorted data and transform it back into orignal <key, value> pairs, for eg, 1 24, etc..
  • Kadima
    Kadima over 7 years
    Trying to use this DescendingIntWritableComparable to implement a descending sort instead of ascending sort, but job.setSortComparatorClass() does not see DescendingIntComparable.class as a class that extends RawComparator, so it doesn't run. Any ideas how to modify this so it'll work?