How to read a file into a string with CR/LF preserved?
Solution 1
Are you sure that those methods are the culprits that are stripping out your characters?
I tried to write up a quick test; StreamReader.ReadToEnd
preserves all newline characters.
string str = "foo\n\r\nbar";
using (Stream ms = new MemoryStream(Encoding.ASCII.GetBytes(str)))
using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
string str2 = sr.ReadToEnd();
Console.WriteLine(string.Join(",", str2.Select(c => ((int)c))));
}
// Output: 102,111,111,10,13,10,98,97,114
// f o o \n \r \n b a r
An identical result is achieved when writing to and reading from a temporary file:
string str = "foo\n\r\nbar";
string temp = Path.GetTempFileName();
File.WriteAllText(temp, str);
string str2 = File.ReadAllText(temp);
Console.WriteLine(string.Join(",", str2.Select(c => ((int)c))));
It appears that your newlines are getting lost elsewhere.
Solution 2
This piece of code will preserve LR and CR
string r = File.ReadAllText(@".\TestData\TR120119.TRX", Encoding.ASCII);
Solution 3
The outcome should be always single string, containing entire file.
It takes two hops. First one is File.ReadAllBytes() to get all the bytes in the file. Which doesn't try to translate anything, you get the raw data in the file so the weirdo line-endings are preserved as-is.
But that's bytes, you asked for a string. So second hop is to apply Encoding.GetString() to convert the bytes to a string. The one thing you have to do is pick the right Encoding class, the one that matches the encoding used by the program that wrote the file. Given that the file is pretty messed up if it contains \n\r\n
sequences, and you didn't document anything else about the file, your best bet is to use Encoding.Default. Tweak as necessary.
greenoldman
I am interested in such topics: usability and accessibility of user interfaces, design of programming languages, natural language processing, machine translation, data mining. Currently I am developing compiler and programming language Skila -- read more at aboutskila.wordpress.com.
Updated on July 09, 2022Comments
-
greenoldman almost 2 years
If I asked the question "how to read a file into a string" the answer would be obvious. However -- here is the catch with CR/LF preserved.
The problem is,
File.ReadAllText
strips those characters.StreamReader.ReadToEnd
just converted LF into CR for me which led to long investigation where I have bug in pretty obvious code ;-)So, in short, if I have file containing
foo\n\r\nbar
I would like to getfoo\n\r\nbar
(i.e. exactly the same content), notfoo bar
,foobar
, orfoo\n\n\nbar
. Is there some ready to use way in .Net space?The outcome should be always single string, containing entire file.
-
Douglas over 11 yearsI don't believe that the choice of encoding should cause newline sequences to get altered.
-
user1703401 over 11 yearsTall assumption if you haven't yet met EBCDIC. Not the point, what's between those ASCII control characters matters.
-
Douglas over 11 yearsIt is the whole point. If the OP is using any ASCII-compatible encoding (including UTF-8), then it won't matter what's between the control characters; multi-byte sequences cannot contain values 10 or 13. Yes, using a non-ASCII-compatible encoding such as EBCDIC (or even UTF-16) would introduce a whole slew of new considerations, but I assume the OP would have mentioned it if they were.
-
greenoldman over 11 yearsWhile for this issue it was my mistake buried inside the code, I like your explanation and the description of the steps -- thank you very much!
-
greenoldman over 11 yearsGosh, you are right and thanks for sample code to test it more thoroughly, I messed up with the code which sits right after reading the text and displaying the outcome confused me. Sorry about that, but a lot to learn anyway.
-
Douglas over 11 yearsGlad you found the cause :-)
-
djk over 6 yearsUnfortunately, that's just not true, as all the other examples, my own tests, and the .Net reference code show.