How to correct unicode filenames?

10,421

Solution 1

I wrote a C / C++ hybrid which does the translation part (does not rename anything, just converts bad byte sequences to good ones). You can download it using the link at the end of this post.

The input file is decoded as an UTF-8 stream into a UNICODE code position sequence which is then NOT converted to any other codepage. All code-positions are under 256, they represent the original UTF-8 string's byte sequence. So I just write these code positions as bytes to the output. The result is a correct UTF-8 string. It is still not an application for my problem, but the core of the solution.

The program is written and tested under Linux, but should work on any OS. Usage example:

nil@hippy:~/playground/c++$ g++ utf8decode.cpp -o utf8decode
nil@hippy:~/playground/c++$ cat > file
Kispál és a Borz - 02 - Tökéletes Helyettes
nil@hippy:~/playground/c++$ cat file | ./utf8decode
Kispál és a Borz - 02 - Tökéletes Helyettes
Characters found: 48
nil@hippy:~/playground/c++$

I wrote an UTF-8 character counter before, and I modified that. I havn't written the whole program in an hour. Source: http://pastebin.com/Hy7tVt5A http://pastebin.com/NFJUP0R5

Solution 2

My problem was that Windows 10 Explorer was not showing Unicode filenames correctly. The name was in Unicode, but garbage was shown on the screen. The answer was that the problem went away when I rebooted.

Solution 3

Let me elaborate on the answer given by dinar qurbanov. To fix file names encoding in Total Commander v7 or higher you'll need to use the multi-rename tool (Ctrl+M).

In there you'll find a folder-like button, click it and select 'Edit names' to get a text file containing file names. After fixing them with any tool/editor you like paste them back and close the editor.

A button to edit filenames

Share:
10,421

Related videos on Youtube

Notinlist
Author by

Notinlist

Updated on September 18, 2022

Comments

  • Notinlist
    Notinlist over 1 year

    I have Windows 7 with NTFS filesystem. I have filenames and directory names like:

    Kispál és a Borz - 02 - Tökéletes Helyettes
    

    I want to transform them to:

    Kispál és a Borz - 02 - Tökéletes Helyettes
    

    The filesystem is capable of storing filenames like フリー百科事典, so it surely has unicode support.

    As I imagine the story, a long time ago they were perfect. Then they were transferred from an UTF-8 to a Latin-1 filesystem, then back to this UTF-8 supporting filesystem. In theory, all information is there, I could write a program in C to fix these characters, but I assume someone somewhere already did it.

    Do you know any utility that can do the transformation?

  • Ramhound
    Ramhound almost 7 years
    "Let me elaborate on the answer given by dinar qurbanov" - Your elaboration should have been submitted as a comment, so the author of the answer, could consider improving the answer.
  • pati
    pati almost 7 years
    Unfortunately, I have not enough reputation to comment.
  • Ramhound
    Ramhound almost 7 years
    Commentary doesn't belong in an answer. Commentary shouldn't be submitted as an answer. Take those statements how you want to. Your inability to submit a comment has nothing to do with submitting a quality answer that doesn't include commentary. Here is the thing, if you submit commentary as an answer, you will never earn enough reputation, to actually submit commentary