How do I remove the character "" from the beginning of a text file in C++?

17,321

Solution 1

That's UTF-8's BOM

You need to read the file as UTF-8. If you don't need Unicode and just use the first 127 ASCII code points then save the file as ASCII or UTF-8 without BOM

Solution 2

This is Byte Order Mark (BOM). It's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.

In C++, you can use the following function to convert a UTF-8 BOM file to ANSI.

void change_encoding_from_UTF8BOM_to_ANSI(const char* filename)
{
    ifstream infile;
    string strLine="";
    string strResult="";
    infile.open(filename);
    if (infile)
    {
        // the first 3 bytes (ef bb bf) is UTF-8 header flags
        // all the others are single byte ASCII code.
        // should delete these 3 when output
        getline(infile, strLine);
        strResult += strLine.substr(3)+"\n";

        while(!infile.eof())
        {
            getline(infile, strLine);
            strResult += strLine+"\n";
        }
    }
    infile.close();

    char* changeTemp=new char[strResult.length()];
    strcpy(changeTemp, strResult.c_str());
    char* changeResult = change_encoding_from_UTF8_to_ANSI(changeTemp);
    strResult=changeResult;

    ofstream outfile;
    outfile.open(filename);
    outfile.write(strResult.c_str(),strResult.length());
    outfile.flush();
    outfile.close();
}
Share:
17,321
Hoang Minh
Author by

Hoang Minh

Updated on June 26, 2022

Comments

  • Hoang Minh
    Hoang Minh about 2 years

    I'm trying to read a text file, and for each word, I will put them into a node of a binary search tree. However, the first character is always read as " + first word". For example, if my first word is "This", then the first word that is inserted into my node is "This". I've been searching the forum for a solution to fix it, there was one post asking the same problem in Java, but no one has addressed it in C++. Would anyone help me to fix it ? Thank you.

    I came to the a simple solution. I opened the file in Notepad, and saved it as ANSI. After that, the file is reading and passing correctly into the binary search tree

  • Jonathan Leffler
    Jonathan Leffler over 10 years
    It is strictly the UTF-8 encoding of U+FEFF, the BOM (also a zero-width no-breaking space, ZWNBSP), presented using the code set ISO 8859-1. UTF-8 does not need a BOM, of course.