Hadoop Spill failure

10,843

Solution 1

Ok, all problems are solved.

The Map-Reduce serialization operation needs intern a default constructor for org.apache.hadoop.io.ArrayWritable.
Hadoops implementation didn't provide a default constructor for ArrayWritable.
That's why the java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.() was thrown and caused the weird spill exception.

A simple wrapper made ArrayWritable really writable and fixed it! Strange that Hadoop did not provide this.

Solution 2

This problem came up for me when the output of one of my map jobs produced a tab character ("\t") or newline character ("\r" or "\n") - Hadoop doesn't handle this well and fails. I was able to solve this using this piece of Python code:

if "\t" in output:
  output = output.replace("\t", "")
if "\r" in output:
  output = output.replace("\r", "")
if "\n" in output:
  output = output.replace("\n", "")

You may have to do something else for your app.

Share:
10,843

Related videos on Youtube

Nico
Author by

Nico

Master student computer science, university of Antwerp, Belgium. Specialization: computer networks and distributed systems.

Updated on May 09, 2022

Comments

  • Nico
    Nico about 2 years

    I'am currently working on a project using Hadoop 0.21.0, 985326 and a cluster of 6 worker nodes and a head node. Submitting a regular mapreduce job fails, but I have no idea why. Has anybody seen this exception before?

    org.apache.hadoop.mapred.Child: Exception running child : java.io.IOException: Spill failed
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1379)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$200(MapTask.java:711)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1193)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:290)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:967)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:583)
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:92)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:111)
        at be.ac.ua.comp.ronny.riki.invertedindex.FilteredInvertedIndexBuilder$Map.map(FilteredInvertedIndexBuilder.java:113)
        at be.ac.ua.comp.ronny.riki.invertedindex.FilteredInvertedIndexBuilder$Map.map(FilteredInvertedIndexBuilder.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:652)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
        at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.<init>()
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:68)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44)
        at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:145)
        at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
        at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
        at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1432)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1457)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$600(MapTask.java:711)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1349)
    Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.<init>()
        at java.lang.Class.getConstructor0(Class.java:2706)
        at java.lang.Class.getDeclaredConstructor(Class.java:1985)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        ... 10 more
    

    Currently, I'm experimenting with some configuration parameters hoping that this error disappears, but until now this was unsuccessful. The configuration parameters I'm tweaking are:

    • mapred.map.tasks = 60
    • mapred.reduce.tasks = 12
    • Job.MAP_OUTPUT_COMPRESS (or mapreduce.map.output.compress) = true
    • Job.IO_SORT_FACTOR (or mapreduce.task.io.sort.factor) = 10
    • Job.IO_SORT_MB (or mapreduce.task.io.sort.mb) = 256
    • Job.MAP_JAVA_OPTS (or mapreduce.map.java.opts) = "-Xmx256" or "-Xmx512"
    • Job.REDUCE_JAVA_OPTS (or mapreduce.reduce.java.opts) = "-Xmx256" or "-Xmx512"

    Can anybody explain why the exception above occurs? And how to avoid it? Or just a short explanation what the hadoop spill operation implies?

  • MrGomez
    MrGomez over 13 years
    See my answer here on why this is the case: stackoverflow.com/questions/4386781/…