C# UTF8 Reading/Outputting
Solution 1
Your program is fine (assuming the input file is actually UTF-8). If you debug your program and use the Watch window to look at the strings (the line
variable), you will find that it is correct. That is how you can be certain that you will send correct HTTP requests (or whatever else you do with the strings).
What you’re seeing is a bug in the Windows console.
Fortunately, it only affects raster fonts. If you change your console window to use a TrueType font, e.g. Consolas or Lucida Console, the problem goes away.
You can set this for all future windows by using the “Defaults” menu item:
Solution 2
See Reading unicode from console
If you're using .NET 4 you will need to use
Console.InputEncoding = Encoding.Unicode;
Console.OutputEncoding = Encoding.Unicode;
and ensure you're using Lucida Console as the console font.
If you're using .NET 3.5 you're probably out of luck.
To efficiently read lines from a file I would probably use:
foreach(var line in File.ReadAllLines(path, Encoding.UTF8))
{
// do stuff
}
Solution 3
For reading all the characters like you mentions you Must use Default encoding like this
new StreamReader(@"E:\database.txt", System.Text.Encoding.Default))
Comments
-
user17753 over 1 year
I'm trying to do something that I think should be fairly simple but I've spent way too much time on it already and I've tried several different approaches that I researched but to no avail.
Basically, I have a huge list of names that have "special" characters in them from the UTF8 charset.
My end goal is to read in each name, and then make an HTTP request using that name in the URL as a GET variable.
My first goal was to read in one name from a file, and put it to standard out to confirm I could read and write UTF8 properly, before creating the strings and make all the HTTP requests.
The
test1.txt
file I made contained just this contents:Öwnägé
I then used this C# code to read in the file. I set the
StreamReader
encoding and theConsole.OutputEncoding
toUTF8
.static void Main(string[] args) { Console.OutputEncoding = System.Text.Encoding.UTF8; using (StreamReader reader = new StreamReader("test1.txt",System.Text.Encoding.UTF8)) { string line; while ((line = reader.ReadLine()) != null) { Console.WriteLine(line); } } Console.ReadLine(); }
Much to my surprise I get this kind of output:
Expected output is the exact same as the original file contents.
How can I be certain that the strings I am going to build to make HTTP requests are going to be correct if I cannot even do a simple task as read/write UTF8 strings?
-
Yuck about 12 years+1 This is correct. Also be sure that you're saving your sample file using UTF-8 and not ANSI which is the default in Notepad.
-
Phil about 12 yearsWhat is the message in the exception?
-
Yuck about 12 yearsThe parameter is incorrect. And it's on the first line
Console.InputEncoding = Encoding.Unicode;
. Using .NET 4 as well. -
user17753 about 12 yearsMy Target Framework is .NET Framework 3.5, I tried this anyways and received IOException that Yuck saw.
-
Phil about 12 yearsYes you will in .NET 3.5. It works fine for me in VS2010, .NET 4 client profile.
-
Phil about 12 yearsEnsure your project properties target framework is .NET 4 client profile or above.
-
Yuck about 12 years@Phil Both .NET 4 and .NET 4 Client Profile result in the same exception with the same message. I can't reproduce this as a solution.
-
user17753 about 12 yearsThis, in conjunction with Yuck's suggestion to make sure I selected UTF-8 instead of ANSI when saving the file worked out. Thanks guys you saved me a lot of headaches I'm sure!
-
Yuck about 12 years@Phil Same - Win 7 SP1, VS 2010, .NET 4
-
Phil about 12 yearsOk, well that's odd, it works for me with Encoding.Unicode or Encoding.UTF8.