How to read a file into a string with CR/LF preserved?

23,158

Solution 1

Are you sure that those methods are the culprits that are stripping out your characters?

I tried to write up a quick test; StreamReader.ReadToEnd preserves all newline characters.

string str = "foo\n\r\nbar";
using (Stream ms = new MemoryStream(Encoding.ASCII.GetBytes(str)))
using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
    string str2 = sr.ReadToEnd();
    Console.WriteLine(string.Join(",", str2.Select(c => ((int)c))));
}

// Output: 102,111,111,10,13,10,98,97,114
//           f   o   o \n \r \n  b  a   r

An identical result is achieved when writing to and reading from a temporary file:

string str = "foo\n\r\nbar";
string temp = Path.GetTempFileName();
File.WriteAllText(temp, str);
string str2 = File.ReadAllText(temp);
Console.WriteLine(string.Join(",", str2.Select(c => ((int)c))));

It appears that your newlines are getting lost elsewhere.

Solution 2

This piece of code will preserve LR and CR

string r = File.ReadAllText(@".\TestData\TR120119.TRX", Encoding.ASCII);

Solution 3

The outcome should be always single string, containing entire file.

It takes two hops. First one is File.ReadAllBytes() to get all the bytes in the file. Which doesn't try to translate anything, you get the raw data in the file so the weirdo line-endings are preserved as-is.

But that's bytes, you asked for a string. So second hop is to apply Encoding.GetString() to convert the bytes to a string. The one thing you have to do is pick the right Encoding class, the one that matches the encoding used by the program that wrote the file. Given that the file is pretty messed up if it contains \n\r\n sequences, and you didn't document anything else about the file, your best bet is to use Encoding.Default. Tweak as necessary.

Share:
23,158
greenoldman
Author by

greenoldman

I am interested in such topics: usability and accessibility of user interfaces, design of programming languages, natural language processing, machine translation, data mining. Currently I am developing compiler and programming language Skila -- read more at aboutskila.wordpress.com.

Updated on July 09, 2022

Comments

  • greenoldman
    greenoldman almost 2 years

    If I asked the question "how to read a file into a string" the answer would be obvious. However -- here is the catch with CR/LF preserved.

    The problem is, File.ReadAllText strips those characters. StreamReader.ReadToEnd just converted LF into CR for me which led to long investigation where I have bug in pretty obvious code ;-)

    So, in short, if I have file containing foo\n\r\nbar I would like to get foo\n\r\nbar (i.e. exactly the same content), not foo bar, foobar, or foo\n\n\nbar. Is there some ready to use way in .Net space?

    The outcome should be always single string, containing entire file.

  • Douglas
    Douglas over 11 years
    I don't believe that the choice of encoding should cause newline sequences to get altered.
  • user1703401
    user1703401 over 11 years
    Tall assumption if you haven't yet met EBCDIC. Not the point, what's between those ASCII control characters matters.
  • Douglas
    Douglas over 11 years
    It is the whole point. If the OP is using any ASCII-compatible encoding (including UTF-8), then it won't matter what's between the control characters; multi-byte sequences cannot contain values 10 or 13. Yes, using a non-ASCII-compatible encoding such as EBCDIC (or even UTF-16) would introduce a whole slew of new considerations, but I assume the OP would have mentioned it if they were.
  • greenoldman
    greenoldman over 11 years
    While for this issue it was my mistake buried inside the code, I like your explanation and the description of the steps -- thank you very much!
  • greenoldman
    greenoldman over 11 years
    Gosh, you are right and thanks for sample code to test it more thoroughly, I messed up with the code which sits right after reading the text and displaying the outcome confused me. Sorry about that, but a lot to learn anyway.
  • Douglas
    Douglas over 11 years
    Glad you found the cause :-)
  • djk
    djk over 6 years
    Unfortunately, that's just not true, as all the other examples, my own tests, and the .Net reference code show.