Auto encoding detect in C#

16,792

A StreamReader will try to automatically detect the encoding of a file if there's a BOM when trying to read:

public class Program
{
    static void Main(string[] args)
    {
        using (var reader = new StreamReader("foo.txt"))
        {
            // Make sure you read from the file or it won't be able
            // to guess the encoding
            var file = reader.ReadToEnd();
            Console.WriteLine(reader.CurrentEncoding);
        }
    }
}
Share:
16,792
AndreyAkinshin
Author by

AndreyAkinshin

Updated on June 14, 2022

Comments

  • AndreyAkinshin
    AndreyAkinshin almost 2 years

    Possible Duplicate:
    Determine a string's encoding in C#

    Many text editorsr (like Notepad++) can detect encoding of arbitrary file. Can I detect encodoing of file in C#?

  • Jon Hanna
    Jon Hanna over 13 years
    +1, though its worth adding that this is not foolproof; many encodings "look" the same to the simple detection method used. Even the best (which is used by the likes of google that can afford to do a lot of crunching and has lots of data to compare streams with) that will consider different possible meanings of "high" octets, aren't 100% perfect. If at all possible, it's best to convey this information precisely.
  • Tyler Liu
    Tyler Liu over 12 years
    It works for common encodings, but not for all encodings.
  • Dan W
    Dan W over 11 years
    This won't work for detecting UTF 16 without the BOM. Nor will it fall back to the user's local default codepage if it fails to detect any unicode encoding. You can fix the latter, but then it won't detect UTF8 without the BOM.
  • Mark
    Mark over 11 years
    StreamReader does NOT attempt to detect the encoding, it simply uses the default. See the very documentation you linked, where it says: "The default character encoding and default buffer size are used."
  • Giles
    Giles about 9 years
    The MSDN documentation does say that the default character encoding will be used, but I've tried passing different BOMs to a StreamReader, and it correctly identified them (i.e. reader.CurrentEncoding returned the expected encoding). I tested with UTF-8, UTF-16-BE and UTF-16LE. Note @Darin's comment though - it won't work if you don't read some data.
  • tomexou
    tomexou over 8 years
    reader.Peek() is enough