How do I lowercase a string in Python?
Solution 1
Use str.lower()
:
"Kilometer".lower()
Solution 2
The canonical Pythonic way of doing this is
>>> 'Kilometers'.lower()
'kilometers'
However, if the purpose is to do case insensitive matching, you should use case-folding:
>>> 'Kilometers'.casefold()
'kilometers'
Here's why:
>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True
This is a str method in Python 3, but in Python 2, you'll want to look at the PyICU or py2casefold - several answers address this here.
Unicode Python 3
Python 3 handles plain string literals as unicode:
>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'
Python 2, plain string literals are bytes
In Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8
.
And lower
doesn't map any changes that bytes would be aware of, so we get the same string.
>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр
In scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263
Use Unicode literals, not str
literals
So we need a unicode
string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u
prefix (and note the u
prefix also works in Python 3):
>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр
Note that the bytes are completely different from the str
bytes - the escape character is '\u'
followed by the 2-byte width, or 16 bit representation of these unicode
letters:
>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
Now if we only have it in the form of a str
, we need to convert it to unicode
. Python's Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode
constructor or str.decode
method with the codec to convert the str
to unicode
:
>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8')
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True
Both methods convert to the unicode type - and same as the unicode_literal.
Best Practice, use Unicode
It is recommended that you always work with text in Unicode.
Software should only work with Unicode strings internally, converting to a particular encoding on output.
Can encode back when necessary
However, to get the lowercase back in type str
, encode the python string to utf-8
again:
>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр
So in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.
Solution 3
With Python 2, this doesn't work for non-English words in UTF-8. In this case decode('utf-8')
can help:
>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр
Solution 4
Also, you can overwrite some variables:
s = input('UPPER CASE')
lower = s.lower()
If you use like this:
s = "Kilometer"
print(s.lower()) - kilometer
print(s) - Kilometer
It will work just when called.
Solution 5
You can use the built-in string method lower
to do that
eg:
>>> s = "Kilometres"
>>> s.lower()
'kilometres'
Benjamin Didur
Updated on March 30, 2022Comments
-
Benjamin Didur about 2 years
Is there a way to convert a string to lowercase?
"Kilometers" → "kilometers"
-
Munim Munna almost 6 yearsQuestion is how to transform string to lowercase. How this answer got so many up-votes?
-
bballdave025 almost 6 yearsPerhaps we should be a bit more explicit by saying that the
decode('utf-8')
is not only unnecessary in Python 3, but causes an error. (ref). Example:$python3; >>>s='Километр'; >>>print (s.lower); #result: километр >>>s.decode('utf-8').lower(); #result: ...AttributeError: 'str' object has no attribute 'decode'
We can see a second way to do this, referencing the excellent answer of @AaronHall.>>>s.casefold() #result: километр
-
bballdave025 almost 6 yearsI have one note that doesn't necessarily apply to the OP's question, but which is important with portability (internationalization) when doing case insensitive matching. With case-insensitive matching, diacritics (accent marks) may become a concern. Example:
>>> "raison d'être".casefold(); "raison d'être"
Check out this answer aboutunidecode
-
vossmalte over 5 years
s=s.lower()
is the way to go. -
Ekrem Dinçel over 3 yearsThis only works well with ASCII characters, you may want to use
str.maketrans
andstr.translate
if you are not getting the expected string. -
Ekrem Dinçel over 3 years@m00lti Why
s
? What the variable name has to do with the question? -
ergo almost 3 years@EkremDinçel s like string, i think.
-
lolesque about 2 yearsNot only ASCII, it works for many diacritics, for example
ÀÇÐÊĞİŃÓŒŘŠŤÚŻ
but there is a problem for dotless i"ı".upper().lower()
becomesi
, while upper dottedİ
is conserved thanks to aCombining dot above (0x307)
. -
vossmalte about 2 years
s
like its used in the answer