Read UTF-16 chars from a file and store them as UTF-8

12,783

Solution 1

InputStreamReader converts characters from their external representation in the specified encoding (UTF-16 in your case) to the internal representation (i.e. char, String), that is always UTF-16 too, so effectively there is no conversion here in your case.

Internal representation of Strings should be converted to the database encoding by your JDBC driver, so you shouldn't care about it (though in the case of MySQL you should care about specifying the proper database encoding in the connection string).

If input encoding and (in the case of MySQL) database encoding are specified correctly, there are no chances of data loss during conversions, since both UTF-8 and UTF-16 are used to represent the same character set.

Solution 2

UTF-8 and UTF-16 cover the same range of characters (full Unicode), so if the input data is valid, the output data will be valid too (unless there is a bug in dao.save()).

Share:
12,783
Argyro Kazaki
Author by

Argyro Kazaki

Updated on July 05, 2022

Comments

  • Argyro Kazaki
    Argyro Kazaki almost 2 years

    I have a Person pojo, with a name attribute which I store in my database within the respective persons table. My db server is MySQL with utf-8 set as the default server encoding, the persons table is an InnoDB table which was also created with utf-8 as the default encoding, and my db connection string specifies utf-8 as the connection encoding.

    I am required to create and store new Person pojos, by reading their names from a txt file (persons.txt) which contains a name in every line, but the file encoding is UTF-16.

    persons.txt

    John

    Μαρία

    Hélène

    etc..

    Here is a sample code:

    PersonDao dao = new PersonDao();
    File file = new File("persons.txt");
    BufferedReader reader = new BufferedReader(
                            new InputStreamReader(new FileInputStream(file), "UTF-16"));
    String line = reader.readLine();
    while (line!=null) {
        Person p = new Person();
        p.setName(line.trim());
        dao.save(p);
        line = reader.readLine();
    }
    

    To sum up, I am reading string characters as utf-16, store them in local variables and persist them as utf-8.

    I would like to ask: Does any character conversion take place during this procedure? If yes, then at what point does this happen? Is it possible that I may end up storing broken characters due to the utf-16 -> utf-8 workflow?