subprocess.Popen with a unicode path

10,434

Solution 1

It looks like you're using Windows and Python 2.X. Use os.startfile:

>>> import os
>>> os.startfile(u'Pokémon.mp3')

Non-intuitively, getting the command shell to do the same thing is:

>>> import subprocess
>>> import locale
>>> subprocess.Popen(u'Pokémon.mp3'.encode(locale.getpreferredencoding()),shell=True)

On my system, the command shell (cmd.exe) encoding is cp437, but for Windows programs is cp1252. Popen wanted shell commands encoded as cp1252. This seems like a bug, and it also seems fixed in Python 3.X:

>>> import subprocess
>>> subprocess.Popen('Pokémon.mp3',shell=True)

Solution 2

Your problem can be solved through smart_str function of Django module.

Use this code:

from django.utils.encoding import smart_str, smart_unicode
cmd = u'cmd /c "C:\\Pok\xe9mon.mp3"'
smart_cmd = smart_str(cmd)
subprocess.Popen(smart_cmd)

You can find information on how to install Django on Windows here. You can first install pip and then you can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

This will install Django in your Python installation's site-packages directory.

Share:
10,434
iTayb
Author by

iTayb

Just another guy surfing the web (:

Updated on June 17, 2022

Comments

  • iTayb
    iTayb almost 2 years

    I have a unicode filename that I would like to open. The following code:

    cmd = u'cmd /c "C:\\Pok\xe9mon.mp3"'
    cmd = cmd.encode('utf-8')
    subprocess.Popen(cmd)
    

    returns

    >>> 'C:\Pokיmon.mp3' is not recognized as an internal or external command, operable program or batch file.
    

    even though the file do exist. Why is this happening?

  • iTayb
    iTayb about 12 years
    setting as 'utf-16' returns TypeError: must be string without null bytes or None, not str so i guess thats wrong.
  • iTayb
    iTayb about 12 years
    I won't install a whole new framework just to encode unicode correctly. fix should be one or two lines long, not 1000+ of complex code.
  • Thanasis Petsas
    Thanasis Petsas about 12 years
    ok, am sorry, I have updated my answer. Maybe it is more helpful now.
  • iTayb
    iTayb about 12 years
    First, the latin-1 encoding is not unicode. It won't work with all unicode cases. Second, it's still doesn't work. Try it yourself.
  • Thanasis Petsas
    Thanasis Petsas about 12 years
    ok, I work on Linux and I tested it with the os.popen it worked.. Maybe for windows doesn't work.. :( I remove my updated part of the answer.
  • iTayb
    iTayb about 12 years
    Thanks for the side note, but this still doesn't fix the unicode problem. This works on your system because your locale MBCS has the ó char. This code won't work on computers that has hebrew or japanese as their locale language.
  • iTayb
    iTayb about 12 years
    Thanks! i didnt know about os.startfile.
  • jfs
    jfs about 10 years
    On Windows on Python 2, Popen(u'Pokémon.mp3'.encode(encoding)) works iff Popen(u'Pokémon.mp3'.encode('mbcs')) works i.e., it should succeed with cp1252 and it should fail with cp437 in your case. Does shell=True change it? What are values for sys.getfilesystemencoding() and locale.getpreferredencoding()? In general, u"é" might be unrepresentable using mbcs. Python 3 uses Unicode API directly.
  • vaab
    vaab about 7 years
    On windows on python 2, if you want to use unicode command line (as python 3), you can use this workaround leveraging ctypes to patch subprocess.Popen(..).
  • Eryk Sun
    Eryk Sun over 6 years
    os.startfile works, but u'Pokémon.mp3'.encode(locale.getpreferredencoding()) will of course fail in any locale in which the ANSI codepage doesn't map "é". In 2.x subprocess.Popen calls CreateProcessA, which decodes the command line as ANSI, so it is limited to commands that can be encoded as such. If you need a command line that can't be encoded as ANSI, then you must do something else via ctypes, cffi, or an extension module, such as call CreateProcessW or a CRT function such as _wsystem.
  • Eryk Sun
    Eryk Sun over 6 years
    CMD is a Unicode application. It only uses codepages to decode bytes when working with files and pipes, such as reading a line of a batch script or a for /f loop that reads stdout from a command. In this case its default codepage is ANSI if it isn't attached to a console. Otherwise it uses the console's input or output codepage (CMD is not the console), which defaults to OEM unless changed via chcp.com. In any case, the encoding CMD uses for files is irrelevant. By the time CMD sees its command line, it's already decoded as Unicode by Windows.
  • gaborous
    gaborous over 5 years
    Someone made a standalone module out of Django smart_str: smartencoding
  • Thanasis Petsas
    Thanasis Petsas over 5 years
    @gaborous that's really helpful! Good idea to isolate and include that functionality in a module. thanks!