UTF-8 characters missing or displayed as boxes in Notepad, but works fine in webbrowser and other text editors

17,069

If it looks fine in other editors, then the text itself is fine. If it looks OK in the browser, then the response is probably fine too (but better check page info in the browser and see what the encoding is). Your problem is probably with notepad itself. Sometimes it requires BOM to detect Unicode properly. But BOM can break other apps that don't support it. You should also try Notepad on different versions of Windows. I have just tried opening an UTF-8 file in Windows 7, looks fine to me.

Share:
17,069
JAVAGeek
Author by

JAVAGeek

Updated on June 13, 2022

Comments

  • JAVAGeek
    JAVAGeek almost 2 years

    I have UTF-8 text stored in DB and served as text/plain; charset=utf-8 in a web application. All the things are working fine. I can see the UTF-8 text on browser window without any problem.

    But when I save that text to a file and try to open it in Windows Notepad, I got some characters missing and displayed as a small rectangular box. However, the text file looks fine in other editors like EditPlus and Notepad++.

    How is this caused and how can I solve it?

  • JAVAGeek
    JAVAGeek over 11 years
    i can see the encoding in notepad++ its ANSI .where i want it to be UTF-8
  • Sergei Tachenov
    Sergei Tachenov over 11 years
    @JAVAGeek, if it's really ANSI then Notepad shouldn't have any problems with reading it. It means that Notepad++ is wrong, and it's not ANSI. By UTF-8 Notepad++ means "UTF-8 with BOM", which isn't strictly correct, as UTF-8 without BOM is UTF-8 too. To be sure, look at your file using some hex viewer - if symbols outside of 7-bit ASCII are encoded as 2 or more bytes, then it's really UTF-8.
  • CallumDA
    CallumDA over 7 years
    Please consider adding more explanation to your answer, for example, explaining where OP went wrong or why your solution works