python 2.7 lowercase

28,766

Solution 1

Use unicode strings:

drostie@signy:~$ python
Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print "ŠČŽ"
ŠČŽ
>>> print "ŠČŽ".lower()
ŠČŽ
>>> print u"ŠČŽ".lower()
ščž

See that little u? That means that it's created as a unicode object rather than a str object.

Solution 2

Use unicode:

>>> print u'ŠČŽ'.lower().encode('utf8')
ščž
>>>

You need to convert your text to unicode as soon as it enters your programme from the outside world, rather than merely at the point at which you notice an issue.

Accordingly, either use the codecs module to read in decoded text, or use 'bytestring'.decode('latin2') (where in place of latin2 you should use whatever the actual encoding is).

Share:
28,766

Related videos on Youtube

Yebach
Author by

Yebach

Updated on February 14, 2020

Comments

  • Yebach
    Yebach over 4 years

    When I use .lower() in Python 2.7, string is not converted to lowercase for letters ŠČŽ. I read data from dictionary.

    I tried using str(tt["code"]).lower(), tt["code"].lower().

    Any suggestions ?

  • Yebach
    Yebach about 12 years
    I am reading from dict so how to convert tt["code"] to u"ŠČŽ"?
  • Yebach
    Yebach about 12 years
    I am reading from dict so how to convert tt["code"] to u"ŠČŽ"? I can not use ustr(tt["code"]).lower().encode('utf8') or str(tt[u"code"]).lower().encode('utf8')
  • Tupteq
    Tupteq about 12 years
    Use unicode(tt["code"], 'latin2'), where 'latin2' is encoding used, so you may need to use different one.
  • Sven Marnach
    Sven Marnach about 12 years
    Also note the unicode.lower() is locale-dependent. It might give different results depending on the environment it runs in.
  • jsbueno
    jsbueno about 12 years
    @SvenMarnach: indeed, it is locale dependent, but the differences due to locale are minimal, close to the differences due to not using Unicode - since in this case, lower and upper will only understand ascii anyway
  • jsbueno
    jsbueno about 12 years
    @Yebach : read this piece, it will help you a lot: joelonsoftware.com/articles/Unicode.html - and - after that - use the "decode" string method to convert your strings to unicode
  • jsbueno
    jsbueno about 12 years
    @Chrisdrost: I think it would be nice if yo0u would add the bit about using the "decode" string method to getting unicode outof string literals to your answer. That is the way to go.