C# UTF8 Reading/Outputting

11,408

Solution 1

Your program is fine (assuming the input file is actually UTF-8). If you debug your program and use the Watch window to look at the strings (the line variable), you will find that it is correct. That is how you can be certain that you will send correct HTTP requests (or whatever else you do with the strings).

What you’re seeing is a bug in the Windows console.

Fortunately, it only affects raster fonts. If you change your console window to use a TrueType font, e.g. Consolas or Lucida Console, the problem goes away.

screenshot

You can set this for all future windows by using the “Defaults” menu item:

screenshot

Solution 2

See Reading unicode from console

If you're using .NET 4 you will need to use

    Console.InputEncoding = Encoding.Unicode;
    Console.OutputEncoding = Encoding.Unicode;

and ensure you're using Lucida Console as the console font.

If you're using .NET 3.5 you're probably out of luck.

To efficiently read lines from a file I would probably use:

foreach(var line in File.ReadAllLines(path, Encoding.UTF8))
{
   // do stuff
}

Solution 3

For reading all the characters like you mentions you Must use Default encoding like this

new StreamReader(@"E:\database.txt", System.Text.Encoding.Default))
Share:
11,408
user17753
Author by

user17753

Hobbyist programmer

Updated on August 05, 2022

Comments

  • user17753
    user17753 over 1 year

    I'm trying to do something that I think should be fairly simple but I've spent way too much time on it already and I've tried several different approaches that I researched but to no avail.

    Basically, I have a huge list of names that have "special" characters in them from the UTF8 charset.

    My end goal is to read in each name, and then make an HTTP request using that name in the URL as a GET variable.

    My first goal was to read in one name from a file, and put it to standard out to confirm I could read and write UTF8 properly, before creating the strings and make all the HTTP requests.

    The test1.txt file I made contained just this contents:

    Öwnägé

    I then used this C# code to read in the file. I set the StreamReader encoding and the Console.OutputEncoding to UTF8.

    static void Main(string[] args)
    {
        Console.OutputEncoding = System.Text.Encoding.UTF8;
    
        using (StreamReader reader = new StreamReader("test1.txt",System.Text.Encoding.UTF8))
        {
            string line;
    
            while ((line = reader.ReadLine()) != null)
            {
                Console.WriteLine(line);
            }
    
        }
    
        Console.ReadLine();
    }
    

    Much to my surprise I get this kind of output:

    enter image description here

    Expected output is the exact same as the original file contents.

    How can I be certain that the strings I am going to build to make HTTP requests are going to be correct if I cannot even do a simple task as read/write UTF8 strings?

  • Yuck
    Yuck about 12 years
    +1 This is correct. Also be sure that you're saving your sample file using UTF-8 and not ANSI which is the default in Notepad.
  • Phil
    Phil about 12 years
    What is the message in the exception?
  • Yuck
    Yuck about 12 years
    The parameter is incorrect. And it's on the first line Console.InputEncoding = Encoding.Unicode;. Using .NET 4 as well.
  • user17753
    user17753 about 12 years
    My Target Framework is .NET Framework 3.5, I tried this anyways and received IOException that Yuck saw.
  • Phil
    Phil about 12 years
    Yes you will in .NET 3.5. It works fine for me in VS2010, .NET 4 client profile.
  • Phil
    Phil about 12 years
    Ensure your project properties target framework is .NET 4 client profile or above.
  • Yuck
    Yuck about 12 years
    @Phil Both .NET 4 and .NET 4 Client Profile result in the same exception with the same message. I can't reproduce this as a solution.
  • user17753
    user17753 about 12 years
    This, in conjunction with Yuck's suggestion to make sure I selected UTF-8 instead of ANSI when saving the file worked out. Thanks guys you saved me a lot of headaches I'm sure!
  • Yuck
    Yuck about 12 years
    @Phil Same - Win 7 SP1, VS 2010, .NET 4
  • Phil
    Phil about 12 years
    Ok, well that's odd, it works for me with Encoding.Unicode or Encoding.UTF8.