Case-insensitive string startswith in Python
Solution 1
You could use a regular expression as follows:
In [33]: bool(re.match('he', 'Hello', re.I))
Out[33]: True
In [34]: bool(re.match('el', 'Hello', re.I))
Out[34]: False
On a 2000-character string this is about 20x times faster than lower()
:
In [38]: s = 'A' * 2000
In [39]: %timeit s.lower().startswith('he')
10000 loops, best of 3: 41.3 us per loop
In [40]: %timeit bool(re.match('el', s, re.I))
100000 loops, best of 3: 2.06 us per loop
If you are matching the same prefix repeatedly, pre-compiling the regex can make a large difference:
In [41]: p = re.compile('he', re.I)
In [42]: %timeit p.match(s)
1000000 loops, best of 3: 351 ns per loop
For short prefixes, slicing the prefix out of the string before converting it to lowercase could be even faster:
In [43]: %timeit s[:2].lower() == 'he'
1000000 loops, best of 3: 287 ns per loop
Relative timings of these approaches will of course depend on the length of the prefix. On my machine the breakeven point seems to be about six characters, which is when the pre-compiled regex becomes the fastest method.
In my experiments, checking every character separately could be even faster:
In [44]: %timeit (s[0] == 'h' or s[0] == 'H') and (s[1] == 'e' or s[1] == 'E')
1000000 loops, best of 3: 189 ns per loop
However, this method only works for prefixes that are known when you're writing the code, and doesn't lend itself to longer prefixes.
Solution 2
How about this:
prefix = 'he'
if myVeryLongStr[:len(prefix)].lower() == prefix.lower()
Solution 3
Another simple solution is to pass a tuple to startswith()
for all the cases needed to match e.g. .startswith(('case1', 'case2', ..))
.
For example:
>>> 'Hello'.startswith(('He', 'HE'))
True
>>> 'HEllo'.startswith(('He', 'HE'))
True
>>>
Solution 4
None of the given answers is actually correct, as soon as you consider anything outside the ASCII range.
For example in a case insensitive comparison ß
should be considered equal to SS
if you're following Unicode's case mapping rules.
To get correct results the easiest solution is to install Python's regex module which follows the standard:
import re
import regex
# enable new improved engine instead of backwards compatible v0
regex.DEFAULT_VERSION = regex.VERSION1
print(re.match('ß', 'SS', re.IGNORECASE)) # none
print(regex.match('ß', 'SS', regex.IGNORECASE)) # matches
Solution 5
Depending on the performance of .lower(), if prefix was small enough it might be faster to check equality multiple times:
s = 'A' * 2000
prefix = 'he'
ch0 = s[0]
ch1 = s[1]
substr = ch0 == 'h' or ch0 == 'H' and ch1 == 'e' or ch1 == 'E'
Timing (using the same string as NPE):
>>> timeit.timeit("ch0 = s[0]; ch1 = s[1]; ch0 == 'h' or ch0 == 'H' and ch1 == 'e' or ch1 == 'E'", "s = 'A' * 2000")
0.2509511683747405
= 0.25 us per loop
Compared to existing method:
>>> timeit.timeit("s.lower().startswith('he')", "s = 'A' * 2000", number=10000)
0.6162763703208611
= 61.63 us per loop
(This is horrible, of course, but if the code is extremely performance critical then it might be worth it)
Nicolas Raoul
I am Nicolas Raoul, IT consultant in Tokyo. Feel free to copy/paste the source code from my StackExchange answers, I release it to the public domain.
Updated on July 09, 2022Comments
-
Nicolas Raoul almost 2 years
Here is how I check whether
mystring
begins with some string:>>> mystring.lower().startswith("he") True
The problem is that
mystring
is very long (thousands of characters), so thelower()
operation takes a lot of time.QUESTION: Is there a more efficient way?
My unsuccessful attempt:
>>> import re; >>> mystring.startswith("he", re.I) False