Strange Jackson Illegal character ((CTRL-CHAR, code 0)) Exception in Map Reduce Combiner

20,750
  • You can use StringUtils from Apache Commons library to escape the string.
  • Or you can replace selectively the control characters from the string before JSON marshaling.

You can also refer to this post: Illegal character - CTRL-CHAR

Share:
20,750

Related videos on Youtube

mle
Author by

mle

Updated on November 17, 2021

Comments

  • mle
    mle over 2 years

    I have a Map-Reduce job with a mapper which takes a record and converts it into an object, an instance of MyObject, which is marshalled to JSON using Jackson. The value is just another Text field in the record.

    The relevant piece of the mapper is something like the following:

    ObjectMapper mapper = new ObjectMapper();
    MyObject val = new MyObject();
    val.setA(stringA);
    val.setB(stringB);
    Writer strWriter = new StringWriter();
    mapper.writeValue(strWriter, val);
    key.set(strWriter.toString());
    

    The outputs of the mapper are sent to a Combiner which unmarshalls the JSON object and aggregates key-value pairs. It is conceptually very simple and is something like:

    public void reduce(Text key, Iterable<IntWritable> values, Context cxt) 
        throws IOException, InterruptedException {
        int count = 0;
        TermIndex x = _mapper.readValue(key.toString(), MyObject.class);
        for (IntWritable int : values) ++count;
        ...
        emit (key, value)
    }
    

    The MyObject class consists of two fields (both strings), get/set methods and a default constructor. One of the fields stores snippets of text based on a web crawl, but is always a string.

    public class MyObject {
      private String A;
      private String B;
    
      public MyObject() {}
    
      public String getA() {
        return A;
      }
      public void setA(String A) {
        this.A = A;
      }
      public String getB() {
        return B;
      } 
      public void setIdx(String B) {
        this.B = B;
      }
    }
    

    My MapReduce job appears to be running fine until it reaches certain records, which I cannot easily access (because the mapper is generating the records from a crawl), and the following exception is being thrown:

    Error: com.fasterxml.jackson.core.JsonParseException: 
    
        Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens
         at [Source: java.io.StringReader@5ae2bee7; line: 1, column: 3]
    

    Would anyone have any suggestions about the cause of this?

    • SkyWalker
      SkyWalker about 8 years
      Use okhttp 1.5.1. Hope it will solve your issue.
    • sagneta
      sagneta about 8 years
      I realize you said you don't have easy access but I suggest front-ending the crawl and removing spurious control characters like 0 (NULL) from the stream and then pass it to jackson. I have seen financial feeds for various securities have spurious data like this that always needs to be culled. It is most likely a defect on the sending side.
    • StaxMan
      StaxMan about 8 years
      At low level something is injecting null bytes (byte 0) into stream, and parser does not accept those (they are invalid for JSON). You need to figure out how and why this happens; it could be many things including concurrency issues, or timing (trying to parse content before it's loaded into buffer).
    • Haris Osmanagić
      Haris Osmanagić about 7 years
      If possible, add a few log lines, so that you can see which records exactly are failing. Also, since you are crawling data, you might be having the same problem as here (GZip encoding): stackoverflow.com/questions/8091524/…