TXT files: how to switch from weird characters back to normal?

27,277

What you're seeing is referred to as mojibake. In short, the application you are opening the file with is using the wrong encoding to try and read the file. The standard fix is to use a transcoding tool, either online or offline (though I know of no free ones for Windows which work offline), or open the document in an application that lets you set the encoding and save it through that as the desired encoding.

As a somewhat hacky alternative, if you can save the file without modifying the encoding, you can change the extension to .eml, format it like an email message, make sure the Content-Type header specifies the correct encoding, and then open the resulting file in a good email client (pretty much anything except Outlook or Windows Mail) and copy the text out of there to a text editor and save it.

For future reference, the generally accepted method of avoiding this is to save files as either UTF-8 or UTF-16 (UTF-8 is usually preferred, as it's better supported by most platforms other than Windows than UTF-16).

In particular, your file does indeed appear to be encoded using KOI-8 (determined based on the statement that the text is Cyrillic and the apparent distribution of actual characters), with the application apparently interpreting it as ISO-8859-1 or Windows codepage 1252(determined simply based on what is being displayed, plus the fact that these are standard fallback encodings for many devices).

Share:
27,277

Related videos on Youtube

Alex
Author by

Alex

Updated on September 18, 2022

Comments

  • Alex
    Alex over 1 year

    So, I have on a flash drive a txt file generated in Cyrillic (my own work, own pen drive), a few years old. Now I needed to open it, only to see this kind of mess.

    I wonder why is this happening and how can I restore it back to normal.. I tried saving it under Unicode and UTF-8 encoding, even some MS-DOS format (an option from Wordpad) but it makes no difference at all.

    • Jeff Zeitlin
      Jeff Zeitlin over 5 years
      Your document appears to have been saved in KOI-8 encoding; you will need to find a way to translate it from that encoding to Unicode.
    • Jeff Zeitlin
      Jeff Zeitlin over 5 years
      You may find 2cyr.com/decode to be useful.
    • Johan Myréen
      Johan Myréen over 5 years
      I hope you have a copy of the original file, because saving the "messy" file using a different encoding just makes the problem worse.
    • Akina
      Akina over 5 years
      Open it in MS Word Rus with auto-detect encoding or manual encoding selection (KOI-8r, CP866, etc. - try until correct pre-view) and then re-save in encoding you need (do not save over!). Or use any online recoder and determine text's original encoding.
    • Alex
      Alex over 5 years
      @Johan Myréen Of course, I figured this out and saved them as new files
    • Alex
      Alex over 5 years
      @Akina "Open it in MS Word Rus with manual encoding selection" Thanks, this worked.
    • Alex
      Alex over 5 years
      @Akina "re-save in encoding you need (do not save over!)" Though I need to mention for the benefit of others with the same (quite common) issue who may stumble upon this thread, this does not work if you re-save as another txt file, even though with the proper encoding - there was again that mess of characters. Only saving as a Word or rtf file helps.
    • Alex
      Alex over 5 years
      @Jeff Zeitlin Thank you too, good that such thing exists, even though I did not need to use it this time, it may be helpful in future :)
  • user1686
    user1686 over 5 years
    Many text editors (e.g. Notepad2) act as transcoding tools: you can force them to open a file using whatever encoding you specify, and then save as UTF-x.
  • Austin Hemmelgarn
    Austin Hemmelgarn over 5 years
    @grawity However, Notepad and Wordpad don't support this, so unless you've got a third-party text editor on Windows, you're out of luck.