DataInputStream and UTF-8

10,582

The problem is not in the inputstreams since they doesn't handle characters, but only bytes. Your problem is at the point you convert those bytes to characters. In this particular case, you need to specify the proper encoding in the String constructor.

String notes = new String(dataBytes, "UTF-8");

See also:


By the way, the DataInputStream has no additional value in the particular code snippet. You can just keep it InputStream.

Share:
10,582
jorgemoya
Author by

jorgemoya

Updated on June 04, 2022

Comments

  • jorgemoya
    jorgemoya almost 2 years

    I'm kind of a new programmer, and I'm having a couple of problems with the code I'm handling.

    Basically what the code does is receive a form from another JSP, read the bytes, parse the data, and submit the results to SalesForce, using DataInputStream.

       //Getting the parameters from request
     String contentType = request.getContentType();
     DataInputStream in = new DataInputStream(request.getInputStream());
     int formDataLength = request.getContentLength();
    
     //System.out.println(formDataLength);
     byte dataBytes[] = new byte[formDataLength];
     int byteRead = 0;
     int totalBytesRead = 0;
     while (totalBytesRead < formDataLength) 
     {
      byteRead = in.read(dataBytes, totalBytesRead, formDataLength);
      totalBytesRead += byteRead;
     }
    

    It works fine, but only if the code handles normal characters. Whenever it tries to handle special characters (like french chars: àâäæçéèêëîïôùûü) I get the following gibberish as a result:

    à âäæçéèêëîïôùûü

    I understand it could be an issue of DataInputStream, and how it doesn't return UTF-8 encoded text. Do you guys offer any suggestions on how to tackle this issue?

    All the .jsp files include <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%> and Tomcat's settings are fine (URI = UTF-8, etc). I tried adding:

    request.setCharacterEncoding("UTF-8");

    and

    response.setCharacterEncoding("UTF-8");

    to no avail.

    Here's an example of how it parses the data:

        //Getting the notes for the Case 
     String notes = new String(dataBytes);
     System.out.println(notes);
     String savenotes = casetype.substring(notes.indexOf("notes"));
     //savenotes = savenotes.substring(savenotes.indexOf("\n"), savenotes.indexOf("---"));
     savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
     savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
     savenotes = savenotes.substring(0,savenotes.indexOf("name=\"datafile"));
     savenotes = savenotes.substring(0,savenotes.lastIndexOf("\n------"));
     savenotes = savenotes.trim();
    

    Thanks in advance.