Proper encoding for file names in zip archives created in Windows and unpacked in linux
If the language of your Windows 7 version used for zipping files is the Brazilian Portuguese language, then the encoding are probably IBM-850 or Windows-1252. Try these.
I have this issue too. But also happens when transferring between different languages of Windows. Between the English and the Brazilian Portuguese Windows versions, for example, the English version uses IBM-437 and the pt-BR version uses IBM-850.
If you use the WinZip for zipping, this issue does not happens. I do not recommend to use the built-in Windows to zipping and/or extracting, as this also causes that encoding issue on filenames.
Related videos on Youtube
Ole
Updated on September 18, 2022Comments
-
Ole over 1 year
I have problems with different charsets in Windows and Linux (Centos).
I have files with special characters in their filenames from many different languages. The zip archive is generated under Win7 and uploaded on a Linux server. Under Windows all characters were displayed normal, as expected. But after uploading and extracting with, either phps'
ZipArchive()
or Linuxunzip
, some special characters were displayed with strange wrong characters.I know that this is a known problem in the interplay between Windows and Linux, but I'm not able to solve my problem. I've tried to unzip my zip file with different charsets, but nothing worked for me. In Portuguese the charater õ makes a lot of problems, but ç is okay.
aplicações.txt
is after unzippingaplicaçΣes.txt
As far as I understood it right, windows uses the ASCII code charset IBM860, but sometimes I read windows-1257 and I do not know which charset is used, when the zip archive is made with WinRar under Win7. Is there a way to check this, or tell WinRar to use UTF-8?
When the zip archive is uploaded to a linux os and unzipped by
ZipArchive()
(php) or on the Linuxbash
withunzip
, the filenames are wrong. Think it is because linux used UTF-8.Under linux command I tried:
unzip -O windows-1257 uploaded.zip -d zipout/
Under linux command I tried:
unzip -O IBM860 uploaded.zip -d zipout/
Under linux command I tried:
unzip -O IBM437 uploaded.zip -d zipout/
Under linux command I tried:
unzip -O UTF-8 uploaded.zip -d zipout/
Under linux command I tried:
unzip -O UTF-16 uploaded.zip -d zipout/
-
Sandeep about 6 yearsHave you tried it with any cross platform software available for Windows and Linux both e.g
7zip
. Just curious if it makes any difference. -
Andrew Morton about 6 yearsApparently WinRAR version 3.92 and earlier don't work properly with UTF-8. Which version of WinRAR are you using?
-
Ole about 6 yearsIndeed, I tried 7z, also with their own file format .7z, but with the same result. I was confused, because I thought that 7zip uses utf-8 for encoding by default, even under windows. Is there a way to explicitly say 7zip to uses utf-8?
-
SpiderPig about 6 yearsWinRar probably used cp437. However õ doesn't exist in that character set. You could change all filenames before compressing the files. If you add e.g. a chinese character to the end of each filename, you will force WinRar to use Utf8.
-
Ole about 6 yearsOkay this would make sense. Thanks for your hint with the chinese character, I will also try this to ensure if it works properly with utf-8 encoding.
-