List comprehension, map, and numpy.vectorize performance

26,936

Solution 1

First comment: don't mix usage of xrange() or range() in your samples... doing so invalidates your question as you're comparing apples and oranges.

I second @Gabe's notion that if you have many large data structures, numpy should win overall... just keep in mind most of the time C is faster than Python, but then again, most of the time, PyPy is faster than CPython. :-)

As far as listcomps vs. map() calls go... one makes 101 function calls while the other one makes 102. meaning you won't see a significant difference in timing, as shown below using the timeit module as @Mike has suggested:

  • List Comprehension

    $ python -m timeit "def foo(x):pass; [foo(i) for i in range(100)]"
    1000000 loops, best of 3: 0.216 usec per loop
    $ python -m timeit "def foo(x):pass; [foo(i) for i in range(100)]"
    1000000 loops, best of 3: 0.21 usec per loop
    $ python -m timeit "def foo(x):pass; [foo(i) for i in range(100)]"
    1000000 loops, best of 3: 0.212 usec per loop

  • map() function call

    $ python -m timeit "def foo(x):pass; map(foo, range(100))"
    1000000 loops, best of 3: 0.216 usec per loop
    $ python -m timeit "def foo(x):pass; map(foo, range(100))"
    1000000 loops, best of 3: 0.214 usec per loop
    $ python -m timeit "def foo(x):pass; map(foo, range(100))"
    1000000 loops, best of 3: 0.215 usec per loop

With that said however, unless you are planning on using the lists that you create from either of these techniques, try avoid them (using lists) completely. IOW, if all you're doing is iterating over them, it's not worth the memory consumption (and possibly creating a potentially massive list in memory) when you only care to look at each element one at a time just discard the list as soon as you're done.

In such cases, I highly recommend the use of generator expressions instead as they don't create the entire list in memory... it is a more memory-friendly, lazy iterative way of looping through elements to process w/o creating a largish array in memory. The best part is that its syntax is nearly identical to that of listcomps:

a = (foo(i) for i in range(100))

2.x users only: along the lines of more iteration, change all the range() calls to xrange() for any older 2.x code then switch to range() when porting to Python 3 where xrange() replaces and is renamed to range().

Solution 2

  • Why are you optimizing this? Have you written working, tested code, then examined your algorithm profiled your code and found that optimizing this will have an effect? Are you doing this in a deep inner loop where you found you are spending your time? If not, don't bother.

  • You'll only know which works fastest for you by timing it. To time it in a useful way, you'll have to specialize it to your actual use case. For example, you can get noticeable performance differences between a function call in a list comprehension versus an inline expression; it isn't clear whether you really wanted the former or if you reduced it to that to make your cases similar.

  • You say that it doesn't matter whether you end up with a numpy array or a list, but if you're doing this kind of micro-optimization it does matter, since those will perform differently when you use them afterward. Putting your finger on that could be tricky, so hopefully it will turn out the whole problem is moot as premature.

  • It is typically better to simply use the right tool for the job for clarity, readability, and so forth. It is rare that I would have a hard time deciding between these things.

    • If I needed numpy arrays, I would use them. I would use these for storing large homogeneous arrays or multidimensional data. I use them a lot, but rarely where I think I'd want to use a list.
      • If I was using these, I'd do my best to write my functions already vectorized so I didn't have to use numpy.vectorize. For example, times_five below can be used on a numpy array with no decoration.
    • If I didn't have cause to use numpy, that is to say if I wasn't solving numerical math problems or using special numpy features or storing multidimensional arrays or whatever...
      • If I had an already-existing function, I would use map. That's what it's for.
      • If I had an operation that fit inside a small expression and I didn't need a function, I'd use a list comprehension.
      • If I just wanted to do the operation for all the cases but didn't actually need to store the result, I'd use a plain for loop.
      • In many cases, I'd actually use map and list comprehensions' lazy equivalents: itertools.imap and generator expressions. These can reduce memory usage by a factor of n in some cases and can avoid performing unnecessary operations sometimes.

If it does turn out this is where performance problems lie, getting this sort of thing right is tricky. It is very common that people time the wrong toy case for their actual problems. Worse, it is extremely common people make dumb general rules based on it.

Consider the following cases (timeme.py is posted below)

python -m timeit "from timeme import x, times_five; from numpy import vectorize" "vectorize(times_five)(x)"
1000 loops, best of 3: 924 usec per loop

python -m timeit "from timeme import x, times_five" "[times_five(item) for item in x]"
1000 loops, best of 3: 510 usec per loop

python -m timeit "from timeme import x, times_five" "map(times_five, x)"
1000 loops, best of 3: 484 usec per loop

A naïve obsever would conclude that map is the best-performing of these options, but the answer is still "it depends". Consider the power of using the benefits of the tools you are using: list comprehensions let you avoid defining simple functions; numpy lets you vectorize things in C if you're doing the right things.

python -m timeit "from timeme import x, times_five" "[item + item + item + item + item for item in x]"
1000 loops, best of 3: 285 usec per loop

python -m timeit "import numpy; x = numpy.arange(1000)" "x + x + x + x + x"
10000 loops, best of 3: 39.5 usec per loop

But that's not all—there's more. Consider the power of an algorithm change. It can be even more dramatic.

python -m timeit "from timeme import x, times_five" "[5 * item for item in x]"
10000 loops, best of 3: 147 usec per loop

python -m timeit "import numpy; x = numpy.arange(1000)" "5 * x"
100000 loops, best of 3: 16.6 usec per loop

Sometimes an algorithm change can be even more effective. This will be more and more effective as the numbers get bigger.

python -m timeit "from timeme import square, x" "map(square, x)"
10 loops, best of 3: 41.8 msec per loop

python -m timeit "from timeme import good_square, x" "map(good_square, x)"
1000 loops, best of 3: 370 usec per loop

And even now, this all may have little bearing on your actual problem. It looks like numpy is so great if you can use it right, but it has its limitations: none of these numpy examples used actual Python objects in the arrays. That complicates what must be done; a lot even. And what if we do get to use C datatypes? These are less robust than Python objects. They aren't nullable. The integers overflow. You have to do some extra work to retrieve them. They're statically typed. Sometimes these things prove to be problems, even unexpected ones.

So there you go: a definitive answer. "It depends."


# timeme.py

x = xrange(1000)

def times_five(a):
    return a + a + a + a + a

def square(a):
    if a == 0:
        return 0

    value = a
    for i in xrange(a - 1):
        value += a
    return value

def good_square(a):
    return a ** 2

Solution 3

If the function itself takes a significant amount of time to execute, it's irrelevant how you map its output to an array. Once you start getting into arrays of millions of numbers, though, numpy can save you a significant amount of memory.

Solution 4

The list comprehension is the fastest, then the map, then the numpy on my machine. The numpy code is quite a bit slower actually than the other two, but that the difference is much less if you use numpy.arange instead of range (or xrange) as I did in the times listed below. Also, if you use psyco, the list comprehension is sped up while the other two were slowed down for me. I also used larger arrays of numbers than in your code and my foo function just computed the square root. Here are some typical times.

Without psyco:

list comprehension: 47.5581952455 ms
map: 51.9082732582 ms
numpy.vectorize: 57.9601876775 ms

With psyco:

list comprehension: 30.4318844993 ms
map: 96.4504427239 ms
numpy.vectorize: 99.5858691538 ms

I used Python 2.6.4 and the timeit module.

Based on these results, I would say that it probably doesn't really make a difference which one you choose for the initialization. I would probably choose the numpy one or the list comprehension based on the speed, but ultimately you should let what you are doing with the array afterwards guide your choice.

Share:
26,936
mcstrother
Author by

mcstrother

Updated on January 15, 2020

Comments

  • mcstrother
    mcstrother over 4 years

    I have a function foo(i) that takes an integer and takes a significant amount of time to execute. Will there be a significant performance difference between any of the following ways of initializing a:

    a = [foo(i) for i in xrange(100)]
    
    a = map(foo, range(100))
    
    vfoo = numpy.vectorize(foo)
    a = vfoo(range(100))
    

    (I don't care whether the output is a list or a numpy array.)

    Is there a better way?

  • wescpy
    wescpy about 14 years
    agreed... large numbers of data structures are fastest when processed in C vs. pure Python
  • Justin Peel
    Justin Peel about 14 years
    Hmm, sorry this is late. I've been trying to post for some time now and getting errors.
  • Mike Graham
    Mike Graham about 14 years
    Note that using numpy.vectorize doesn't actually effectively move things into C like using real numpy operations does.
  • Mike Graham
    Mike Graham about 14 years
    Note that map requires a function to be defined but list comprehensions do not, which can be a benefit. A realistic list comprehension use that does the same thing as your code might inline f. python -m timeit "def foo(x):pass; [None for i in range(100)]" gives results on my machine that take about 2/3 the time of your list comprehension use. Is this actually what OP wants? I don't know that it is, but it does show that these questions are nuanced and that the conclusions can strongly reflect more about how we contrive our example than anything about a real use.
  • wescpy
    wescpy about 14 years
    not quite. map() doesn't require a function either, as in map(None, range(100)). i have a longer spiel about performance and listcomps vs. map() but the OP didn't ask that question, so i can't answer it here. what i can say is that to really speed up listcomps, you have to reduce that function down to an expression and use that (instead of the function). making function calls have a performance penalty which is magnified in a tight loop.
  • Mike Graham
    Mike Graham over 12 years
    keep in mind that C is always faster than Python Note that this isn't actually true.
  • wescpy
    wescpy over 12 years
    i've reworded the language in that sentence.