Why is Python 3 is considerably slower than Python 2?
The difference is in the implementation of the int
type. Python 3.x uses the arbitrary-sized integer type (long
in 2.x) exclusively, while in Python 2.x for values up to sys.maxint
a simpler int
type is used that uses a simple C long
under the hood.
Once you limit your loops to long
integers, Python 3.x is faster:
>>> from timeit import timeit
>>> MAX_NUM = 3*10**3
>>> def bar():
... i = MAX_NUM + sys.maxsize
... while i > sys.maxsize:
... i -= 1
...
Python 2:
>>> timeit(bar, number=10000)
5.704327821731567
Python 3:
>>> timeit(bar, number=10000)
3.7299320790334605
I used sys.maxsize
as sys.maxint
was dropped from Python 3, but the integer value is basically the same.
The speed difference in Python 2 is thus limited to the first (2 ** 63) - 1 integers on 64-bit, (2 ** 31) - 1 integers on 32 bit systems.
Since you cannot use the long
type with xrange()
on Python 2, I did not include a comparison for that function.
Comments
-
gsb-eng almost 2 years
I've been trying to understand why Python 3 is actually taking much time compared with Python 2 in certain situations, below are few cases I've verified from python 3.4 to python 2.7.
Note: I've gone through some of the questions like Why is there no xrange function in Python3? and loop in python3 much slower than python2 and Same code slower in Python3 as compared to Python2, but I feel that I didn't get the actual reason behind this issue.
I've tried this piece of code to show how it is making difference:
MAX_NUM = 3*10**7 # This is to make compatible with py3.4. try: xrange except: xrange = range def foo(): i = MAX_NUM while i> 0: i -= 1 def foo_for(): for i in xrange(MAX_NUM): pass
When I've tried running this programme with py3.4 and py2.7 I've got below results.
Note: These stats came through a
64 bit
machine with2.6Ghz
processor and calculated the time usingtime.time()
in single loop.Output : Python 3.4 ----------------- 2.6392083168029785 0.9724123477935791 Output: Python 2.7 ------------------ 1.5131521225 0.475143909454
I really don't think that there has been changes applied to
while
orxrange
from 2.7 to 3.4, I knowrange
has been started acting as toxrange
in py3.4 but as documentation saysrange()
now behaves likexrange()
used to behave, except it works with values of arbitrary size. The latter no longer exists.this means change from
xrange
torange
is very much equal to a name change but working with arbitrary values.I've verified disassembled byte code as well.
Below is the disassembled byte code for function
foo()
:Python 3.4: --------------- 13 0 LOAD_GLOBAL 0 (MAX_NUM) 3 STORE_FAST 0 (i) 14 6 SETUP_LOOP 26 (to 35) >> 9 LOAD_FAST 0 (i) 12 LOAD_CONST 1 (0) 15 COMPARE_OP 4 (>) 18 POP_JUMP_IF_FALSE 34 15 21 LOAD_FAST 0 (i) 24 LOAD_CONST 2 (1) 27 INPLACE_SUBTRACT 28 STORE_FAST 0 (i) 31 JUMP_ABSOLUTE 9 >> 34 POP_BLOCK >> 35 LOAD_CONST 0 (None) 38 RETURN_VALUE python 2.7 ------------- 13 0 LOAD_GLOBAL 0 (MAX_NUM) 3 STORE_FAST 0 (i) 14 6 SETUP_LOOP 26 (to 35) >> 9 LOAD_FAST 0 (i) 12 LOAD_CONST 1 (0) 15 COMPARE_OP 4 (>) 18 POP_JUMP_IF_FALSE 34 15 21 LOAD_FAST 0 (i) 24 LOAD_CONST 2 (1) 27 INPLACE_SUBTRACT 28 STORE_FAST 0 (i) 31 JUMP_ABSOLUTE 9 >> 34 POP_BLOCK >> 35 LOAD_CONST 0 (None) 38 RETURN_VALUE
And below is the disassembled byte code for function
foo_for()
:Python: 3.4 19 0 SETUP_LOOP 20 (to 23) 3 LOAD_GLOBAL 0 (xrange) 6 LOAD_GLOBAL 1 (MAX_NUM) 9 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 12 GET_ITER >> 13 FOR_ITER 6 (to 22) 16 STORE_FAST 0 (i) 20 19 JUMP_ABSOLUTE 13 >> 22 POP_BLOCK >> 23 LOAD_CONST 0 (None) 26 RETURN_VALUE Python: 2.7 ------------- 19 0 SETUP_LOOP 20 (to 23) 3 LOAD_GLOBAL 0 (xrange) 6 LOAD_GLOBAL 1 (MAX_NUM) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 6 (to 22) 16 STORE_FAST 0 (i) 20 19 JUMP_ABSOLUTE 13 >> 22 POP_BLOCK >> 23 LOAD_CONST 0 (None) 26 RETURN_VALUE
If we compare both the byte codes they've produced the same disassembled byte code.
Now I'm wondering what change from 2.7 to 3.4 is really causing this huge change in execution time in the given piece of code.
-
atelcikti1 almost 9 yearsIs there any reason you would not want to optimize for integers below 2**63? They seem to be the most frequently used...
-
Martijn Pieters almost 9 years@thebjorn: the simplification of using one
int
type was more important. Besides, if you are doingfor
loops over that large a range you are probably doing something wrong anyway. -
atelcikti1 almost 9 yearsBut doesn't this choice make all integer calculations on e.g. array indexes etc. slower too? It seems that other languages (Smalltalk, Lisp, Haskell, Java) go to some lengths in order to optimize the boxing/unboxing of integers, are those optimizations superfluous in a language like Python?
-
Martijn Pieters almost 9 years@thebjorn: sequences are explicitly limited to
sys.maxsize
anyway and for up to 2**30 (one digit) the code is highly optimised to just return the first digit from the Pythonint
object. -
Luis Vito over 8 years@MartijnPieters It still doesn't give the same performance, as the OP's tests show.
-
Martijn Pieters over 8 years@quant_dev: Of course those tests don't give the same performance. I never said they could be made to give the same performance. When you are using large integers past
sys.maxint
you get the same performance in Python 2 because the same codepaths are involved then.