Java Apache FileUtils readFileToString and writeStringToFile problems

12,807

Solution 1

Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:

@Test
public void testFileRWAsArray() throws IOException{
    String f1String="";
    byte[] bytes=FileUtils.readFileToByteArray(f1);
    for(byte b:bytes){
        f1String=f1String+((char)b);
    }
    File temp=File.createTempFile("deleteme", "deleteme");
    byte[] newBytes=new byte[f1String.length()];
    for(int i=0; i<f1String.length(); ++i){
        char c=f1String.charAt(i);
        newBytes[i]= (byte)c;
    }
    FileUtils.writeByteArrayToFile(temp, newBytes);
    assertTrue(FileUtils.contentEquals(f1, temp));
}

By using a cast between byte-char, I have the symmetry on conversion. Thank you all!

Solution 2

A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical.  For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly.  In short - don't do that.  Use readFileToByteArray and writeByteArrayToFile instead.

Share:
12,807

Related videos on Youtube

Mateu
Author by

Mateu

Updated on June 04, 2022

Comments

  • Mateu
    Mateu almost 2 years

    I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case. I've developed the following JUnit test case:

        String f1String=FileUtils.readFileToString(f1);
        File temp=File.createTempFile("deleteme", "deleteme");
        FileUtils.writeStringToFile(temp, f1String);
        assertTrue(FileUtils.contentEquals(f1, temp));
    

    This test converts a file to a string and writtes it back. However the test is failing. I think it may be because of the encodings, but in FileUtils there is no much detailed info about this. Anyone can help? Thanks!

    Added for further undestanding: Why I need this? I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done. To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and put it back to a file.