Extract files with umlaut in 7zip file created under windows to Linux

5,909

Solution 1

For "äpfel" to become "äpfel", it would be necessary to get äpfel{UTF-8} and convert it using ISO-8859-15 to UTF8. Then you would get äpfel{UTF-8}.

So how can this happen? (There appears to be no ISO-8859-1[5] (Latin1) in your workflow).

I believe I could reproduce this on a VFAT or NTFS partition using the mount iocharset=value option. If I set it to ISO-8859-15 and had a locale of UTF-8, then maybe the system could be tricked into converting filenames "in the wrong direction".

But here, your Wheezy installation is most likely ext3, and I'm not aware of a NLS option for ext3.

Another possibility is that the files are actually correctly created, and you're just seeing them wrong:

  • is Putty set to use UTF8?
  • are your FTP server (and client) set to UTF8?

I notice another strange thing: your two apple files, the one at 16:10 and the one at 16:34, appear to be displayed by ls using two different date formats. In one case, the year is specified.

It might be that 7z is creating a slightly unusual inode entry?

However, here is a trick using convmv utility that might be of help.

Solution 2

This issue with zips has been fixed in the most recent far2l file and archive manager. For zip legacy charset detection by far2l to work properly, your system language setting should match the one set on the system where the archive was created (Windows' internal "zip folders" tool uses just the same logic), so if your installation is German, and zip file was created on PC with German language setting also, everything should be ok out of the box.

Solution 3

The -scs option is only for @listfiles which seems to be a file containing a list of file names. It won't affect the charset of your file names.

One possible solution would be to run iconv with appropriate options against all your files after you extracted them.

Share:
5,909

Related videos on Youtube

The Wavelength
Author by

The Wavelength

I'm just a curious programmer.

Updated on September 18, 2022

Comments

  • The Wavelength
    The Wavelength over 1 year

    I want to extract a large backup of my hard drive compressed with 7zip under Windows to my Debian Wheezy installation. I'm using the following command line:

    7z x -pmypasswordhere file.7z
    

    If there's now a file or a folder called Äpfel (German for apples) the result on the Linux hard drive is äpfel.

    How can I solve this issue? I tried using the following, but this says that the command line is invalid:

    7z x -scsWIN -pmypasswordhere file.7z
    

    ...where the -scs switch is explained as: "-scs{UTF-8 | WIN | DOS}: set charset for list files".

    I've compressed the file on Window 8 on a NTFS partition with 7z 9.30 64bit. The options were compression strength is Ultra. I've encrypted file names and their contents with AES-265. My Debian Wheezy installation is german, so echo $LANG is "de_DE.UTF-8".

    • mpy
      mpy almost 11 years
      It seems that there is something special with your setup. I just tried to zip a file Äpfel.txt with current 7-zip (9.20) under windows; get it from http://download.mpy.de/apples.7z. I neither have a problem with an ancient linux version of 7z (4.57, dated in 2007) nor with the version 9.04 (from debian squeeze?). In both cases Äpfel.txt gets extracted correctly. Does the locale string (locale=de_DE.UTF-8) given by 7z when started without arguments is correct in your case? What does echo $LANG say?
    • The Wavelength
      The Wavelength almost 11 years
      Please look at the most recent edit I've made in the start post. When I'm using your example, I get the same result. There's something interesting: img.xn--mg-eka.de/fe997.png. On the left side is Putty, on the right side is my FTP client. The first "Äpfel.txt" in Putty is the file of your 7z file. The "?pfel.txt" is the file created with the FTP client. The interesting story: if I do the same in another directory, it works like expected... I think it's a more general problem I have too look into, nothing that is related to 7z. Thanks anway!
    • mpy
      mpy almost 11 years
      I rechecked with with AES-265 encryption, this does not break anything. (However I forgot to mention that I can only test with Win XP right now.) I cannot follow your story with FTP completely, but in my experience (graphical) FTP or SSH clients are always a pain in the neck when it comes to uncommon characters. Can you use scp instead or mount a windows share?
    • The Wavelength
      The Wavelength almost 11 years
      I've tried it. Everything that looks okay in my FTP client looks okay in WinSCP and vice versa. Everything that looks okay in Putty looks wrong in both WinSCP and my FTP client.
    • mpy
      mpy almost 11 years
      I'm sure the graphical client is the problem. E.g. for WinSCP the FAQ (winscp.net/eng/docs/ui_login_environment#utf) states: ,,UTF-8 is not supported with SCP protocol.''
    • The Wavelength
      The Wavelength almost 11 years
      You're right. I think it's okay then :)
  • The Wavelength
    The Wavelength almost 11 years
    The backup is more than 100GB and more than 100.000 files...
  • The Wavelength
    The Wavelength almost 11 years
    What would be the appropriate options?
  • scai
    scai almost 11 years
    The encoding of your Windows file system and the encoding of your Linux file system. The former seems to be UTF-16 for NTFS and the latter is probably UTF-8. Try it on your Äpfel file.
  • The Wavelength
    The Wavelength almost 11 years
    Thanks a lot! Well, the question is not which is the preferred encoding. The one displayed well in putty or the one which is displayed well in FTP and can be access via a web server? I don't really know.
  • LSerni
    LSerni almost 11 years
    If you can check out the configuration of the various "channels", you should be able to make it work everywhere. There almost seems to be a ISO-8859-1 hidden somewhere. I'd try to run convmv on a small sample of files.