SVN Error: Can't convert string from native encoding to 'UTF-8'

66,264

Solution 1

  1. It does not change the encoding of the file. It changes the encoding of the filename (to something that every client can hopefully understand).
  2. Allowed by whom ? NTFS uses 16-bit code points, and Windows can expose the file names in various encodings, based on how you ask for it (it will try to convert them to the encoding you ask for). Now... That bit (how you ask) depends on the specific svn client you use. It sounds to me like a bug in TortoiseSVN.

Edit to add:

Ugh. I misunderstood the symptoms. the svn server stores everything in utf-8 (and it seems that it did that successfully).

The post-commit hook is the bit that fails to convert from UTF-8. If I understand what you're saying correctly, the post-commit hook on the server triggers an svn update to a shared drive (the svn server therefore starts an svn client to itself...) ? This means that the configuration that needs to be fixed is the one for the client on the server. Check the LANG / LC_ALL on the environment executing the svn server.. As it happens, the hooks are run in a vacuum environment (see Tip). So you should set the variable in the hook itself.

See also this page for info on how svn handles localisation

Solution 2

Yet another example:

$ svn update
svn: Error converting entry in directory '.' to UTF-8
svn: Can't convert string from native encoding to 'UTF-8':

$ export LC_CTYPE=en_US.UTF-8

$ svn update

(... and all is fine now)

Solution 3

If Error is -

[abc@288832-web3 public_html]$ svn update
svn: Error converting entry in directory 'images' to UTF-8
svn: Valid UTF-8 data
(hex: 46 65 6e 65 72 62 61 68)
followed by invalid UTF-8 sequence
(hex: e7 65 2b 46)

Then do this.

[abc@288832-web3 public_html]$ printf "\x46\x65\x6e\x65\x72\x62\x61\x68\n"
Fenerbah  

(This means that the system has some file name starting with "Fenerbah" in that folder.)

[abc@288832-web3 public_html]$ cd  images
[abc@288832-web3 images]$ rm -rf Fenerbahçe+Forma+2.jpg

So you can see that there is a special character in the name and it is not supported by SVN.

Solution 4

Just use the following line in your script before executing any svn command. User appropriate language codes, in following example I used japanese

export LC_ALL=ja_JP.UTF8

Solution 5

put this in your post-commit export LANG=xxxxx (your lang)

Share:
66,264

Related videos on Youtube

Camsoft
Author by

Camsoft

Updated on August 20, 2020

Comments

  • Camsoft
    Camsoft over 3 years

    I've got a post-commit hook script that performs a SVN update of a working copy when commits are made to the repository.

    When users commit to the repository from their Windows machines using TortoiseSVN they get the following error:

    post-commit hook failed (exit code 1) with output:
    svn: Error converting entry in directory '/home/websites/devel/website/guides/Images' to UTF-8
    svn: Can't convert string from native encoding to 'UTF-8':
    svn: Teneriffa-S?\195?\188d.jpg
    

    The file in question above is: Teneriffa-Süd.jpg notice the accented u. This is because the site is German and the files have been spelt in German.

    When executing a update on the working copy at the Linux command-line no errors are encountered. The above error only exists when the post-commit hook is executed via a commit by a Windows SVN client.

    Questions:

    1. Why would SVN try to change the encoding of a file?
    2. Are filenames allowed to contain chars that are outside the Windows standard ASCII ones?

    Update:

    It turns out that the file in question's filename correctly displays as Teneriffa-Süd.jpg when viewed from a Windows machine (via Samba) but when I view the filename from the Linux server (using SSH and PuTTY) where the file resides I get Teneriffa-Süd.jpg

    • Josh Kelley
      Josh Kelley almost 11 years
      A quick note: The discrepancy in filename between Samba + Windows and SSH +PuTTY is probably the result of PuTTY's configuration rather than anything to do with your problem. Under PuTTY's Window, Translation, the "Remote Character Set" option probably needs to be changed to UTF-8.
    • Flimm
      Flimm almost 11 years
      For me, the problem was with non-ASCII characters in my commit message.
  • Camsoft
    Camsoft over 14 years
    The file name Teneriffa-Süd.jpg is correctly displayed in my working copy on my Windows machine as well as the the working copy that the post-commit hook is trying to update which resides on a Linux server (same server as repositories) when the folder is viewed in Windows using a samba share. But when when I do a ls in the folder at the Linux command-line I get: Teneriffa-Süd.jpg
  • Camsoft
    Camsoft over 14 years
    echoed that system variable in Linux and it returned en_GB.UTF-8 which implies that it is using UTF-8
  • Ignacio Vazquez-Abrams
    Ignacio Vazquez-Abrams over 14 years
    I meant that it should be echoed on your local system, but it doesn't apply if you're running Windows, so never mind.
  • Bahbar
    Bahbar over 14 years
    that probably just means that the filename holds data that is directly UTF-8 encoded (not surprising since the conversion failed), and windows parses that fine, while your linux box is not configured to see UTF-8 filenames, so it reads it as whatever codepage it wants.
  • Camsoft
    Camsoft over 14 years
    Yes you are correct in that the SVN client that fails in the client on the server itself. I'll have a look at the links you sent me and get back to you.
  • TraderJoeChicago
    TraderJoeChicago over 12 years
    One line and problem solved. I was trying LANG without the export for half an hour. :-( You do need to install your locale doing this.
  • jperelli
    jperelli over 12 years
    +1 for saying that the hooks are run in a vacuum environment, then export LANG=xxxxx do the trick
  • Ben Voigt
    Ben Voigt over 11 years
    You should as least mention what system these commands are meant for, they are not standard commands.
  • Keith
    Keith about 11 years
    I added the export statement to the top of my pre-commit file and it works. export LC_CTYPE=en_US.UTF-8
  • Marcelo Amorim
    Marcelo Amorim over 9 years
    Example for Brazilian Portuguese (cedilla, a acute, etc): export LC_CTYPE=pt_BR.UTF-8
  • vicenteherrera
    vicenteherrera almost 9 years
    Spanish from Spain, worked using: export LC_CTYPE=es_ES.UTF-8
  • Trebor Rude
    Trebor Rude over 8 years
    I also had to unset LC_ALL, or set it to en_US.UTF-8.
  • georg
    georg over 8 years
    some more tips and tricks in this post arjuna.deltoso.net/index.html%3Fp=334.html