Suppress the u'prefix indicating unicode' in python strings

76,665

Solution 1

You could use Python 3.0.. The default string type is unicode, so the u'' prefix is no longer required..

In short, no. You cannot turn this off.

The u comes from the unicode.__repr__ method, which is used to display stuff in REPL:

>>> print repr(unicode('a'))
u'a'
>>> unicode('a')
u'a'

If I'm not mistaken, you cannot override this without recompiling Python.

The simplest way around this is to simply print the string..

>>> print unicode('a')
a

If you use the unicode() builtin to construct all your strings, you could do something like..

>>> class unicode(unicode):
...     def __repr__(self):
...             return __builtins__.unicode.__repr__(self).lstrip("u")
... 
>>> unicode('a')
a

..but don't do that, it's horrible

Solution 2

I had a case where I needed drop the u prefix because I was setting up some javascript with python as part of an html template. A simple output left the u prefix in for the dict keys e.g.

var turns = [{u'armies':2...];

which breaks javascript.

In order to get the output javascript needed, I used the json python module to encode the string for me:

turns = json.dumps(turns)

This does the trick in my particular case and as the keys are all ascii there is no worry about the encoding. You could probably use this trick for your debug output.

Solution 3

from __future__ import unicode_literals

is available since Python 2.6 (released on October 1, 2008). It is default in Python 3.

It allows to omit u'' prefix in the source code though it does not change repr(unicode_string) that would be misleading.

You could override sys.displayhook() in a Python REPL, to display objects however your like. You could also override __repr__ for your own custom objects.

Solution 4

using str( text ) is a somewhat bad idea in fact whenever you cannot be 100% sure about both your python's default encoding and the exact content of the string---the latter would be typical for a text fetched from the internet. also, depending on what you want to do, using print text.encode( 'utf-8' ) or print repr( text.encode( 'utf-8' ) ) may yield disappointing results, as you might get a rendering full of unreadable codepoints like \x3a.

i think the optimum is really to avail yourself of a unicode-capable command line (difficult under windows, easy under linux) and switch from python 2.x to python 3.x. the ease and clarity of text vs bytes handling afforded by the new python 3 series is really one of the big gains you can expect. it does mean you'll have to spend a little time learning the distinction between 'bytes' and 'text' and grasp the concept of character encodings, but then that time is much better spent in a python 3 environment as python's new approch to these vexing problems is much clearer and much less error-prone than what python 2 had to offer. i'd go so far as to call python 2's approach to unicode problematic in retrospect, although i used to think of it as superior---when i compared it to the way this issue is handled in php.

edit i just stopped by a related discussion here on SO and found this comment on the way that php these days appears to tackle unicode / encoding issues:

It's like a mouse trying to eat an elephant. By framing Unicode as an extension of ASCII (we have normal strings and we have mb_strings) it gets things the wrong way around, and gets hung up on what special cases are required to deal with characters with funny squiggles that need more than one byte. If you treat Unicode as providing an abstract space for any character you need, ASCII is accommodated in that without any need to treat it as a special case.

i quote this here because in my experience 90% of all SO python+unicode topics seem to come from people who used to be fine with ascii or maybe latin-1, got bitten by the occasional character that was not supported in their usual settings, and then basically just want to get rid of it. what you do when switching to python 3 is exactly what the commenter above suggests to do: instead of viewing unicode as a vexing extension of ascii, you start to view ascii (and almost any other encoding you'll ever meet) as subset(s) of unicode.

to be true, unicode v6 is certainly not the last word in encodings, but it is as close to being universal as you can get in 2011. get used to it.

Solution 5

What seems to be working for me:

import ast
import json
j = json.loads('{"one" : "two"}')
j
dd = {u'one': u'two'}
dd
# to get double quotes
json.dumps(j,  encoding='ascii')
json.dumps(dd, encoding='ascii')
# to get single quotes
str(ast.literal_eval(json.dumps(j,  encoding='ascii')))
str(ast.literal_eval(json.dumps(dd, encoding='ascii')))

Output:

>>> {u'one': u'two'}
>>> {u'one': u'two'}
>>> '{"one": "two"}'
>>> '{"one": "two"}'
>>> "{'one': 'two'}"
>>> "{'one': 'two'}"

Above works for dictionaries and JSON objects, as self-evident.

For just a string, wrapping in str() seems to work for me.

s=u'test string'
s
str(s)

Output:

>>> u'test string'
>>> 'test string'

Python version: 2.7.12

Share:
76,665
Ryan
Author by

Ryan

Updated on July 09, 2022

Comments

  • Ryan
    Ryan almost 2 years

    Is there a way to globally suppress the unicode string indicator in python? I'm working exclusively with unicode in an application, and do a lot of interactive stuff. Having the u'prefix' show up in all of my debug output is unnecessary and obnoxious. Can it be turned off?