UTF-8 percentage encoding and python

11,777

Solution 1

Your file has to encode your string as utf-8 before quoting it, and the string should be unicode. Also you have to specify the appropriate file encoding for your source file in the coding section:

# -*- coding: utf-8 -*-

import urllib

s = u'î'
print urllib.quote(s.encode('utf-8'))

Gives me the output:

%C3%AE

Solution 2

That is because you're not declaring the encoding your file is using, so Python is inferring it from your current locale configuration. I'll suggest you to do this:

# -*- coding: utf-8 -*-
import urllib

mystring = "î"
print urllib.quote(mystring)
print urllib.quote_plus(mystring)

And also make sure your file.py is getting saved to disk with utf-8 encoding.

For me that yields:

$python ex.py
%C3%AE
%C3%AE

Couple of caveats. If your trying this from the interpreter, the # -*- coding: utf-8 -*- won't work if your console encoding isn't utf-8. Instead, you should change it to whatever encoding your console is using: # -*- coding: (encoding here) -*-.

Then, you should decode your string into Unicode using decode method and passing it the the encoding name your console is using as argument:

mystring = "î".decode('<your encoding>')

And later pass it to urllib encoded as utf-8:

print urllib.quote(mystring.encode('utf-8'))
print urllib.quote_plus(mystring.encode('utf-8'))

Hope this helps!

Share:
11,777
user1379351
Author by

user1379351

Updated on June 12, 2022

Comments

  • user1379351
    user1379351 almost 2 years

    I'm trying to get python to give me percent encoded strings. The API I'm interacting with (which I think is using percent encoded UTF-8), gives %c3%ae for î. However, python's urllib.quote gives %3F.

    import urllib
    
    mystring = "î"
    print urllib.quote(mystring)
    print urllib.quote_plus(mystring)
    print urllib.quote(mystring.encode('utf-8'))
    

    Any help appreciated.

  • Viktor Kerkez
    Viktor Kerkez over 10 years
    Are you sure that your script is saved as utf-8 encoded file?
  • user1379351
    user1379351 over 10 years
    Nope, was in ANSI. Thanks!
  • user1379351
    user1379351 over 10 years
    Thanks. File was ANSI, hence the problem.
  • LucasB
    LucasB over 6 years
    Note that in Python3 this will be import urllib.parse and then urllib.parse.quote.