Why is Python 3 is considerably slower than Python 2?

python performance python-2.7 python-3.x cpython

18,352

The difference is in the implementation of the int type. Python 3.x uses the arbitrary-sized integer type (long in 2.x) exclusively, while in Python 2.x for values up to sys.maxint a simpler int type is used that uses a simple C long under the hood.

Once you limit your loops to long integers, Python 3.x is faster:

>>> from timeit import timeit
>>> MAX_NUM = 3*10**3
>>> def bar():
...     i = MAX_NUM + sys.maxsize
...     while i > sys.maxsize:
...         i -= 1
...

Python 2:

>>> timeit(bar, number=10000)
5.704327821731567

Python 3:

>>> timeit(bar, number=10000)
3.7299320790334605

I used sys.maxsize as sys.maxint was dropped from Python 3, but the integer value is basically the same.

The speed difference in Python 2 is thus limited to the first (2 ** 63) - 1 integers on 64-bit, (2 ** 31) - 1 integers on 32 bit systems.

Since you cannot use the long type with xrange() on Python 2, I did not include a comparison for that function.

18,352

Author by

gsb-eng

A Programmer. Useful answers: Dict Equality Override

Updated on June 20, 2022

Comments

gsb-eng almost 2 years

I've been trying to understand why Python 3 is actually taking much time compared with Python 2 in certain situations, below are few cases I've verified from python 3.4 to python 2.7.

Note: I've gone through some of the questions like Why is there no xrange function in Python3? and loop in python3 much slower than python2 and Same code slower in Python3 as compared to Python2, but I feel that I didn't get the actual reason behind this issue.

I've tried this piece of code to show how it is making difference:

MAX_NUM = 3*10**7

# This is to make compatible with py3.4.
try:
    xrange
except:
    xrange = range


def foo():
    i = MAX_NUM
    while i> 0:
        i -= 1

def foo_for():
    for i in xrange(MAX_NUM):
        pass

When I've tried running this programme with py3.4 and py2.7 I've got below results.

Note: These stats came through a 64 bit machine with 2.6Ghz processor and calculated the time using time.time() in single loop.

Output : Python 3.4
-----------------
2.6392083168029785
0.9724123477935791

Output: Python 2.7
------------------
1.5131521225
0.475143909454

I really don't think that there has been changes applied to while or xrange from 2.7 to 3.4, I know range has been started acting as to xrange in py3.4 but as documentation says

range() now behaves like xrange() used to behave, except it works with values of arbitrary size. The latter no longer exists.

this means change from xrange to range is very much equal to a name change but working with arbitrary values.

I've verified disassembled byte code as well.

Below is the disassembled byte code for function foo():

Python 3.4:
--------------- 

 13           0 LOAD_GLOBAL              0 (MAX_NUM)
              3 STORE_FAST               0 (i)

 14           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (i)
             12 LOAD_CONST               1 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE       34

 15          21 LOAD_FAST                0 (i)
             24 LOAD_CONST               2 (1)
             27 INPLACE_SUBTRACT
             28 STORE_FAST               0 (i)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK
        >>   35 LOAD_CONST               0 (None)
             38 RETURN_VALUE

python 2.7
-------------

 13           0 LOAD_GLOBAL              0 (MAX_NUM)
              3 STORE_FAST               0 (i)

 14           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (i)
             12 LOAD_CONST               1 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE       34

 15          21 LOAD_FAST                0 (i)
             24 LOAD_CONST               2 (1)
             27 INPLACE_SUBTRACT    
             28 STORE_FAST               0 (i)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK           
        >>   35 LOAD_CONST               0 (None)
             38 RETURN_VALUE

And below is the disassembled byte code for function foo_for():

Python: 3.4

 19           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (MAX_NUM)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 GET_ITER
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

 20          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE


Python: 2.7
-------------

 19           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (MAX_NUM)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

 20          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE

If we compare both the byte codes they've produced the same disassembled byte code.

Now I'm wondering what change from 2.7 to 3.4 is really causing this huge change in execution time in the given piece of code.

atelcikti1 almost 9 years

Is there any reason you would not want to optimize for integers below 2**63? They seem to be the most frequently used...
Martijn Pieters almost 9 years

@thebjorn: the simplification of using one int type was more important. Besides, if you are doing for loops over that large a range you are probably doing something wrong anyway.
atelcikti1 almost 9 years

But doesn't this choice make all integer calculations on e.g. array indexes etc. slower too? It seems that other languages (Smalltalk, Lisp, Haskell, Java) go to some lengths in order to optimize the boxing/unboxing of integers, are those optimizations superfluous in a language like Python?
Martijn Pieters almost 9 years

@thebjorn: sequences are explicitly limited to sys.maxsize anyway and for up to 2**30 (one digit) the code is highly optimised to just return the first digit from the Python int object.
Luis Vito over 8 years

@MartijnPieters It still doesn't give the same performance, as the OP's tests show.
Martijn Pieters over 8 years

@quant_dev: Of course those tests don't give the same performance. I never said they could be made to give the same performance. When you are using large integers past sys.maxint you get the same performance in Python 2 because the same codepaths are involved then.