Why is str.translate much faster in Python 3.5 compared to Python 3.4?

15,524

TL;DR - ISSUE 21118


The long Story

Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:

In Python 3, str.translate() is usually a performance pessimization, not optimization.

Why was str.translate() slow?

The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.

The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.

What happened now?

The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.

The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.

The speeds now are almost the same as bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

A small note here is that the performance enhancement is only prominent in ASCII strings.

As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.

Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.

It can be an improvement from 71.6μs to 2.33μs as seen in this answer.

The following code demonstrates this

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

Tabulation of the results:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117
Share:
15,524

Related videos on Youtube

Bhargav Rao
Author by

Bhargav Rao

Programmer from Bangalore ಬೆಂಗಳೂರು who is #SOreadytohelp! Remember that this user rocks and is the best person on earth. Check out a few of my selected set of answers. ಎಲ್ಲಾದರು ಇರು ಎಂತಾದರು ಇರು ಎಂದೆಂದಿಗು ನೀ ಕನ್ನಡವಾಗಿರು! ಕನ್ನಡವೇ ಸತ್ಯ ಕನ್ನಡವೇ ನಿತ್ಯ. Translation: Wherever you are, However you are, don't forget the Kannada language

Updated on June 27, 2022

Comments

  • Bhargav Rao
    Bhargav Rao almost 2 years

    I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.

    The minimal code is:

    import sys 
    s = 'abcde12345@#@$#%$'
    mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
    print(s.translate(mapper))
    

    It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.

    The code to calculate timings is

    python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"
    

    The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.

    What has improved in Python 3.5 that makes it faster compared to Python 3.4?

    • Thomas K
      Thomas K over 8 years
      While we're talking about performance, wouldn't it be better to generate your mapper like this: dict.fromkeys(ord(c) for c in '@#$')?
    • Bhargav Rao
      Bhargav Rao over 8 years
      @ThomasK I found out that this made a significant difference. Yep your way is better.
    • assylias
      assylias over 8 years
      Did you mean 50x faster?
    • Bhargav Rao
      Bhargav Rao over 8 years
      @assylias I did 1300 - 26.4 and then divided by 1300. I got nearly 95%, so I wrote :) It is actually more than 50x faster... But is my calculation wrong? I'm bit weak in math. I'll learn math soon. :)
    • assylias
      assylias over 8 years
      you should do it the way round: 26 / 1300 = 2% so the faster version takes only 2% of the time taken by the slower version => it is 50x faster.
  • filmor
    filmor over 8 years
    This is one of the commits: github.com/python/cpython/commit/…
  • jfs
    jfs over 8 years
    note: ascii vs. non-ascii case may differ significantly in performance. It is not about 55%: as your answer shows, the speed up can be 1000s%.
  • jfs
    jfs over 8 years
    compare: python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)" (ascii) vs. python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)" (non-ascii). The latter is much (10x) slower.
  • Bhargav Rao
    Bhargav Rao over 8 years
    @J.F. Oh, I understood it now. I ran your code for both 3.4 and 3.5. I am getting Py3.4 faster for non-ascii stuff. Is it by coincidence? The results dpaste.com/15FKSDQ
  • jfs
    jfs over 8 years
    Before 3.5, both ascii and non-ascii cases are probably the same for Unicode .translate() i.e., ascii case is much faster in Python 3.5 only (you don't need bytes.translate() for performance there).
  • jfs
    jfs over 8 years
    I didn't mean to imply that bytes.translate() is suitable for a non-ascii case e.g., '\U0001F602' is a multi-byte sequence (in most encodings) and therefore you can't map it using bytes.translate() (it is not the only reason but it is enough). Note: the initial case is ascii-only.
  • David Z
    David Z over 8 years
    To my ear 1000% improvement suggests the code is not only 9 times as fast but actually runs backward in time and finishes before it starts :-P Just kidding, of course, but I do see it as somewhat unclear what 1000% improvement means. If you find yourself making another edit anyway, I'd recommend altering that wording. (The timing results you posted are what I'd consider a roughly 90% improvement: the improved version takes (100%-90%)=10% as long as the original.)
  • Bhargav Rao
    Bhargav Rao over 8 years
    @DavidZ I am really confused with the calculations! :P Can you please tell me the percentage.. From 91.2 to 2.3? (I am poor in math :( )
  • David Z
    David Z over 8 years
    @BhargavRao oh, sure. For percent changes, always divide by the original value, not the new one. In this case, (91.2 - 2.3) / 91.2 x 100% = 97.5% improvement.
  • Bhargav Rao
    Bhargav Rao over 8 years
    @DavidZ Thanks a lot. I was dividing by the wrong value (the latter). o wonder I got 3000% improvement when the devs claimed 55%.... Thanks again. :)
  • jfs
    jfs over 8 years
    @DavidZ: 97.5% is not human-readable -- there are multiple formulas possible. 1000s% is from my comment above (Python 3.5 is the baseline 100%). The purpose is to communicate the order of magnitude (tens of times -- thousands percents unlike fifty percents that were in the answer before -- otherwise I wouldn't use percents at all here). The point is unicode .translate() was dog-slow in the ascii case before Python 3.5. Also, 97.5% promises too much (unlike 1000s%) -- benchmarking is a tricky endeavor to get right if we stop talking about order of magnitudes numbers.
  • Bhargav Rao
    Bhargav Rao over 8 years
    @J.F. I believe in your last line. I think the important thing is to get the main point that .translate() has been improved. I have for the present removed the comparisons. Thanks a lot.