Python 3 print() function with Farsi/Arabic characters

19,987

Solution 1

Your code is correct as it works on my computer with both Python 2 and 3 (I'm on OS X):

~$ python -c 'print "تست"'
تست
~$ python3 -c 'print("تست")'
تست

The problem is with your terminal that can not output unicode characters. You could verify it by redirecting your output to a file like python3 my_file.py > test.txt and open the file using an editor.

If you are on Windows you could use a terminal like Console2 or ConEmu that renders unicode better than Windows prompt.

You may encounter errors with these terminals too because of wrong code-pages/encodings of Windows. There is a small python package that fixes them (sets them correctly):

1- Install this pip install win-unicode-console

2- Put this at the top of your python file:

try:
    # Fix UTF8 output issues on Windows console.
    # Does nothing if package is not installed
    from win_unicode_console import enable
    enable()
except ImportError:
    pass

If you got errors when redirecting to a file, you may fix it by settings io encoding:

On Windows command line:

SET PYTHONIOENCODING=utf-8

On Linux/OS X terminal:

export PYTHONIOENCODING=utf-8

Some points

  • There is no need to use u"aaa" syntax in python 3. Strings literals are unicode by default.
  • Default coding of files is UTF8 in python 3 so coding declaration comment (e.g. # -*- coding: utf-8 -*-) is not needed.

Solution 2

The output will depend basically on which platform&terminal you run your code. Let's examine the below snippet for different windows terminals running either with 2.x or 3.x:

# -*- coding: utf-8 -*-
import sys

def case1(text):
    print(text)

def case2(text):
    print(text.encode("utf-8"))

def case3(text):
    sys.stdout.buffer.write(text.encode("utf-8"))

if __name__ == "__main__":
    text = "چرا کار نمیکنی؟"

    for case in [case1, case2, case3]:
        try:
            print("Running {0}".format(case.__name__))
            case(text)
        except Exception as e:
            print(e)

        print('-'*80)

Results

Python 2.x

Sublime Text 3 3122

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    چرا کار نمیکنی؟--------------------------------------------------------------------------------

ConEmu v151205

    Running case1
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ
    --------------------------------------------------------------------------------
    Running case2
    'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
    --------------------------------------------------------------------------------
    Running case3
    'file' object has no attribute 'buffer'
    --------------------------------------------------------------------------------

Windows Command Prompt

    Running case1
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ
    --------------------------------------------------------------------------------

    Running case2
    'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
    --------------------------------------------------------------------------------

    Running case3
    'file' object has no attribute 'buffer'
    --------------------------------------------------------------------------------

Python 3.x

Sublime Text 3 3122

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    چرا کار نمیکنی؟--------------------------------------------------------------------------------

ConEmu v151205

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ--------------------------------------------------------------------------------

Windows Command Prompt

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <unde
    fined>
    --------------------------------------------------------------------------------

    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda
    \xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------

    Running case3
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ----------------------------------------------------
    ----------------------------

As you can see just using sublime text3 terminal (case3) worked alright. The other terminals didn't support persian. The main point here is, it depends which terminal & platform you're using.

Solution (ConEmu specific)

Modern terminals like ConEmu allows you to work with UTF8-Encoding as explained here, so, let's try:

chcp 65001 & cmd

And then running again the script against 2.x & 3.x:

Python2.x

Running case1
��را کار نمیکنی؟[Errno 0] Error
--------------------------------------------------------------------------------
Running case2
'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
--------------------------------------------------------------------------------
Running case3
'file' object has no attribute 'buffer'
--------------------------------------------------------------------------------

Python3.x

Running case1
چرا کار نمیکنی؟
--------------------------------------------------------------------------------
Running case2
b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
--------------------------------------------------------------------------------
Running case3
چرا کار نمیکنی؟--------------------------------------------------------------------------------

As you can see, now the output was succesfull with python3 case1 (print). So... moral of a fable... learn more about your tools and how to configure them properly for your use-cases ;-)

Solution 3

I can't reproduce the problem. Here is my script p.py:

text = "چرا کار نمیکنی؟"
print(text)

And the result of python3 p.py:

چرا کار نمیکنی؟

Are you sure you're using python 3 ? With python2 p.py:

SyntaxError: Non-ASCII character '\xda' in file p.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Share:
19,987
Soorena
Author by

Soorena

Updated on June 06, 2022

Comments

  • Soorena
    Soorena almost 2 years

    I simplified my code for better understanding. here is the problem :

    case 1:

    # -*- coding: utf-8 -*-
    
    text = "چرا کار نمیکنی؟" # also using u"...." results the same
    print(text)
    

    output:

    UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    

    case 2:

    text = "چرا کار نمیکنی؟".encode("utf-8") 
    print(text)
    

    there is no output.

    case 3:

    import sys
    
    text = "چرا کار نمیکنی؟".encode("utf-8")
    sys.stdout.buffer.write(text)
    

    output:

    چرا کار نمیکنی؟
    

    I know that case 3 works somehow , but I want to use other functions like print() , write(str()) , ....

    I also read the documentation of python 3 regarding to Unicode here.

    and also read dozens of Q&A in stackoverflow.

    and here is a long article explaining the problem and answer for python 2.X

    the simple question is:

    how to print non-ASCII characters like Farsi or Arabic using python print() function?

    update 1 : as it is suggested from many guys that the problem is concerned with the terminal I tested the case :

    case 4 :

    text = "چرا کار نمیکنی؟" .encode("utf-8")# also using u"...." results the same
    print(text)
    

    terminal :

    python persian_encoding.py > test.txt
    

    test.txt :

    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    

    very important update:

    after a while playing around with this issue, finally I found another workaround to make cmd.exe do the job (without needing third party softwares like ConEmu or ...):

    a little explanation first:

    our main problem does not concern Python. it's a problem with the Command Prompt character set in Windows(for complete explanation check out Arman's Answer) so ... if you change the character set of Windows Command Prompt to UTF-8 instead of default ascii , then the Command Prompt will be able to interact with UTF-8 characters(like Farsi or Arabic) this solution does not guarantee good representation of characters(as they will be printed out like little squares), but it's a good solution if you want to have file I/O in python with UTF-8 characters.

    Steps:

    before starting python from command line , type:

    chcp 65001
    

    now run your python code as always.

    python testcode.py
    

    result in case 1:

    ?????? ??? ??????
    

    it runs without errors.

    screenshot:

    enter image description here

    for more information about how to set 65001 as the default character set check this out.

  • Soorena
    Soorena over 7 years
    sorry , thats not the case for me.
  • Soorena
    Soorena over 7 years
    agreed. here text is interpreted as bytes.
  • Soorena
    Soorena over 7 years
    I wrote in the question : I know that case 3 works somehow , but I want to use other functions like print() , write(str()) , .... for example consider that I want to write these characters to a text file, but again it's output will be byte characters. no matter which text editor used for openning the file.
  • Soorena
    Soorena over 7 years
    regarding this , your answer will not help me. but I really appreciate your efforts and time spent on my question.
  • BPL
    BPL over 7 years
    @Soorena I don't know why it's not helping you, I just want you to understand which the output depends on the terminal you're using. It wouldn't be a problem if you were using a multiplatform gui application written on Qt, about file printing, same stuff, I'll edit my question very soon regarding that. Which platform/terminal/editor you're using?
  • Soorena
    Soorena over 7 years
    Windows 10 64bit , Intellij Idead 9 , Python 3.5.2
  • Soorena
    Soorena over 7 years
    I updated my question check it out.
  • BPL
    BPL over 7 years
    @Soorena I've edited my question again, hope it helps, I've given the solution for conemu . I've used cygwin, msysgit, vs_cmd prompt and I don't like any of them... conemu is a great choice which reminds me to the powerful unix terminals (not as powerful), I recommend it to you ;)
  • BPL
    BPL over 7 years
    @Arman Ordookhani: Just for the record, win-unicode-console worked for me on conemu (case1) but it won't on command prompt. Also had problems installing on py2.x
  • Soorena
    Soorena over 7 years
    @arman-ordookhani worked like a charm!!!!!!
  • Soorena
    Soorena over 7 years
    thanks , I used Console2 instead of ConEmu , both of them work fine.
  • Arman Ordookhani
    Arman Ordookhani over 7 years
    @BPL I think command prompt does not have the ability to show Persian characters (only support a very small subset of unicode) or maybe there is some config of cmd.exe that I'm not aware of.
  • Soorena
    Soorena over 7 years
    the point about ConEmu console was a great help, and it's a very time-saving console trick. thanks again.
  • Jawad
    Jawad about 2 years
    doesn't work when i try to write it in file
  • ali reza
    ali reza almost 2 years
    tnx a lot. this method worked for me: sys.stdout.buffer.write(text.encode("utf-8"))