Proper encoding for file names in zip archives created in Windows and unpacked in linux

21,939

If the language of your Windows 7 version used for zipping files is the Brazilian Portuguese language, then the encoding are probably IBM-850 or Windows-1252. Try these.

I have this issue too. But also happens when transferring between different languages of Windows. Between the English and the Brazilian Portuguese Windows versions, for example, the English version uses IBM-437 and the pt-BR version uses IBM-850.

If you use the WinZip for zipping, this issue does not happens. I do not recommend to use the built-in Windows to zipping and/or extracting, as this also causes that encoding issue on filenames.

Share:
21,939

Related videos on Youtube

Ole
Author by

Ole

Updated on September 18, 2022

Comments

  • Ole
    Ole over 1 year

    I have problems with different charsets in Windows and Linux (Centos).

    I have files with special characters in their filenames from many different languages. The zip archive is generated under Win7 and uploaded on a Linux server. Under Windows all characters were displayed normal, as expected. But after uploading and extracting with, either phps' ZipArchive() or Linux unzip, some special characters were displayed with strange wrong characters.

    I know that this is a known problem in the interplay between Windows and Linux, but I'm not able to solve my problem. I've tried to unzip my zip file with different charsets, but nothing worked for me. In Portuguese the charater õ makes a lot of problems, but ç is okay.

    aplicações.txt is after unzipping aplicaçΣes.txt

    As far as I understood it right, windows uses the ASCII code charset IBM860, but sometimes I read windows-1257 and I do not know which charset is used, when the zip archive is made with WinRar under Win7. Is there a way to check this, or tell WinRar to use UTF-8?

    When the zip archive is uploaded to a linux os and unzipped by ZipArchive() (php) or on the Linux bash with unzip, the filenames are wrong. Think it is because linux used UTF-8.

    Under linux command I tried:

    unzip -O windows-1257 uploaded.zip -d zipout/ 
    

    Under linux command I tried:

    unzip -O IBM860 uploaded.zip -d zipout/ 
    

    Under linux command I tried:

    unzip -O IBM437 uploaded.zip -d zipout/ 
    

    Under linux command I tried:

    unzip -O UTF-8 uploaded.zip -d zipout/ 
    

    Under linux command I tried:

    unzip -O UTF-16 uploaded.zip -d zipout/
    
    • Sandeep
      Sandeep about 6 years
      Have you tried it with any cross platform software available for Windows and Linux both e.g 7zip. Just curious if it makes any difference.
    • Andrew Morton
      Andrew Morton about 6 years
      Apparently WinRAR version 3.92 and earlier don't work properly with UTF-8. Which version of WinRAR are you using?
    • Ole
      Ole about 6 years
      Indeed, I tried 7z, also with their own file format .7z, but with the same result. I was confused, because I thought that 7zip uses utf-8 for encoding by default, even under windows. Is there a way to explicitly say 7zip to uses utf-8?
    • SpiderPig
      SpiderPig about 6 years
      WinRar probably used cp437. However õ doesn't exist in that character set. You could change all filenames before compressing the files. If you add e.g. a chinese character to the end of each filename, you will force WinRar to use Utf8.
    • Ole
      Ole about 6 years
      Okay this would make sense. Thanks for your hint with the chinese character, I will also try this to ensure if it works properly with utf-8 encoding.