Extract files with umlaut in 7zip file created under windows to Linux
Solution 1
For "äpfel" to become "äpfel", it would be necessary to get äpfel{UTF-8} and convert it using ISO-8859-15 to UTF8. Then you would get äpfel{UTF-8}.
So how can this happen? (There appears to be no ISO-8859-1[5] (Latin1) in your workflow).
I believe I could reproduce this on a VFAT or NTFS partition using the mount
iocharset=value
option. If I set it to ISO-8859-15 and had a locale of UTF-8, then maybe the system could be tricked into converting filenames "in the wrong direction".
But here, your Wheezy installation is most likely ext3
, and I'm not aware of a NLS option for ext3.
Another possibility is that the files are actually correctly created, and you're just seeing them wrong:
- is Putty set to use UTF8?
- are your FTP server (and client) set to UTF8?
I notice another strange thing: your two apple files, the one at 16:10 and the one at 16:34, appear to be displayed by ls
using two different date formats. In one case, the year is specified.
It might be that 7z is creating a slightly unusual inode entry?
However, here is a trick using convmv
utility that might be of help.
Solution 2
This issue with zips has been fixed in the most recent far2l file and archive manager. For zip legacy charset detection by far2l to work properly, your system language setting should match the one set on the system where the archive was created (Windows' internal "zip folders" tool uses just the same logic), so if your installation is German, and zip file was created on PC with German language setting also, everything should be ok out of the box.
Solution 3
The -scs
option is only for @listfiles which seems to be a file containing a list of file names. It won't affect the charset of your file names.
One possible solution would be to run iconv
with appropriate options against all your files after you extracted them.
Related videos on Youtube
Comments
-
The Wavelength over 1 year
I want to extract a large backup of my hard drive compressed with 7zip under Windows to my Debian Wheezy installation. I'm using the following command line:
7z x -pmypasswordhere file.7z
If there's now a file or a folder called Äpfel (German for apples) the result on the Linux hard drive is äpfel.
How can I solve this issue? I tried using the following, but this says that the command line is invalid:
7z x -scsWIN -pmypasswordhere file.7z
...where the
-scs
switch is explained as: "-scs{UTF-8 | WIN | DOS}: set charset for list files".I've compressed the file on Window 8 on a NTFS partition with 7z 9.30 64bit. The options were compression strength is Ultra. I've encrypted file names and their contents with AES-265. My Debian Wheezy installation is german, so echo $LANG is "de_DE.UTF-8".
-
mpy almost 11 yearsIt seems that there is something special with your setup. I just tried to zip a file
Äpfel.txt
with current7-zip
(9.20) under windows; get it fromhttp://download.mpy.de/apples.7z
. I neither have a problem with an ancient linux version of 7z (4.57, dated in 2007) nor with the version 9.04 (from debian squeeze?). In both casesÄpfel.txt
gets extracted correctly. Does the locale string (locale=de_DE.UTF-8
) given by 7z when started without arguments is correct in your case? What doesecho $LANG
say? -
The Wavelength almost 11 yearsPlease look at the most recent edit I've made in the start post. When I'm using your example, I get the same result. There's something interesting: img.xn--mg-eka.de/fe997.png. On the left side is Putty, on the right side is my FTP client. The first "Äpfel.txt" in Putty is the file of your 7z file. The "?pfel.txt" is the file created with the FTP client. The interesting story: if I do the same in another directory, it works like expected... I think it's a more general problem I have too look into, nothing that is related to 7z. Thanks anway!
-
mpy almost 11 yearsI rechecked with with AES-265 encryption, this does not break anything. (However I forgot to mention that I can only test with Win XP right now.) I cannot follow your story with FTP completely, but in my experience (graphical) FTP or SSH clients are always a pain in the neck when it comes to uncommon characters. Can you use
scp
instead or mount a windows share? -
The Wavelength almost 11 yearsI've tried it. Everything that looks okay in my FTP client looks okay in WinSCP and vice versa. Everything that looks okay in Putty looks wrong in both WinSCP and my FTP client.
-
mpy almost 11 yearsI'm sure the graphical client is the problem. E.g. for WinSCP the FAQ (winscp.net/eng/docs/ui_login_environment#utf) states: ,,UTF-8 is not supported with SCP protocol.''
-
The Wavelength almost 11 yearsYou're right. I think it's okay then :)
-
-
The Wavelength almost 11 yearsThe backup is more than 100GB and more than 100.000 files...
-
The Wavelength almost 11 yearsWhat would be the appropriate options?
-
scai almost 11 yearsThe encoding of your Windows file system and the encoding of your Linux file system. The former seems to be UTF-16 for NTFS and the latter is probably UTF-8. Try it on your Äpfel file.
-
The Wavelength almost 11 yearsThanks a lot! Well, the question is not which is the preferred encoding. The one displayed well in putty or the one which is displayed well in FTP and can be access via a web server? I don't really know.
-
LSerni almost 11 yearsIf you can check out the configuration of the various "channels", you should be able to make it work everywhere. There almost seems to be a ISO-8859-1 hidden somewhere. I'd try to run
convmv
on a small sample of files.