List comprehension vs. lambda + filter

767,794

Solution 1

It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.

There are two things that may slow down your use of filter.

The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.

The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won't apply.

The other option to consider is to use a generator instead of a list comprehension:

def filterbyvalue(seq, value):
   for el in seq:
       if el.attribute==value: yield el

Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.

Solution 2

This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.

Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.

I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.

Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)

Solution 3

Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I'm inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer filter.

A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):

[x for x in X if P(x)]

but sometimes you want to apply some function to the values first:

[f(x) for x in X if P(f(x))]


As a specific example, consider

primes_cubed = [x*x*x for x in range(1000) if prime(x)]

I think this looks slightly better than using filter. But now consider

prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]

In this case we want to filter against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I'd be apt to use

prime_cubes = filter(prime, [x*x*x for x in range(1000)])

Solution 4

Although filter may be the "faster way", the "Pythonic way" would be not to care about such things unless performance is absolutely critical (in which case you wouldn't be using Python!).

Solution 5

I thought I'd just add that in python 3, filter() is actually an iterator object, so you'd have to pass your filter method call to list() in order to build the filtered list. So in python 2:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)

lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))

The problem is that list() takes an iterable as it's argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.

Share:
767,794
Agos
Author by

Agos

SOreadytohelp

Updated on July 20, 2022

Comments

  • Agos
    Agos almost 2 years

    I have a list that I want to filter by an attribute of the items.

    Which of the following is preferred (readability, performance, other reasons)?

    xs = [x for x in xs if x.attribute == value]
    
    xs = filter(lambda x: x.attribute == value, xs)
    
    • abarnert
      abarnert almost 11 years
      A better example would be a case where you already had a nicely named function to use as your predicate. In that case, I think a lot more people would agree that filter was more readable. When you have a simple expression that can be used as-is in a listcomp, but has to be wrapped in a lambda (or similarly constructed out of partial or operator functions, etc.) to pass to filter, that's when listcomps win.
    • Matteo Ferla
      Matteo Ferla almost 5 years
      It should be said that in Python3 at least, the return of filter is a filter generator object not a list.
    • Sal Borrelli
      Sal Borrelli about 3 years
      More readable? I guess it is a matter of personal taste but to me, the list comprehension solution looks like plain English: "for each element in my_list, take it only if it's attribute equals value" (!?). I guess even a non programmer might try to understand what's going on, more or less. In the second solution... well... what's that strange "lamba" word, to start with? Again, it is probably a matter of personal taste but I would go for the list comprehension solution all the time, regardless of potential tiny differences in performance that are basically only of interest to researchers.
  • Wayne Werner
    Wayne Werner about 14 years
    +1 for the generator. I have a link at home to a presentation that shows how amazing generators can be. You can also replace the list comprehension with a generator expression just by changing [] to (). Also, I agree that the list comp is more beautiful.
  • dashesy
    dashesy about 11 years
    Thanks for the links to Guido's input, if nothing else for me it means I will try not to use them any more, so that I won't get the habit, and I won't become supportive of that religion :)
  • njzk2
    njzk2 about 10 years
    but reduce is the most complex to do with simple tools! map and filter are trivial to replace with comprehensions!
  • giaosudau
    giaosudau over 9 years
    python -m timeit 'filter(lambda x: x in [1,2,3,4,5], range(10000000))' 10 loops, best of 3: 1.44 sec per loop python -m timeit '[x for x in range(10000000) if x in [1,2,3,4,5]]' 10 loops, best of 3: 860 msec per loop Not really?!
  • John La Rooy
    John La Rooy over 9 years
    @sepdau, lambda functions are not builtins. List comprehensions have improved over the past 4 years - now the difference is negligible anyway even with builtin functions
  • thiruvenkadam
    thiruvenkadam over 9 years
    filter returns a list and we can use len on it. At least in my Python 2.7.6.
  • thiruvenkadam
    thiruvenkadam over 9 years
    I will be happy to know the reason for down voting so that I will not repeat it again anywhere in the future.
  • Adeynack
    Adeynack over 9 years
    It is not the case in Python 3. a = [1, 2, 3, 4, 5, 6, 7, 8] f = filter(lambda x: x % 2 == 0, a) lc = [i for i in a if i % 2 == 0] >>> type(f) <class 'filter'> >>> type(lc) <class 'list'>
  • Agos
    Agos over 9 years
    the definition of filter and list comprehension were not necessary, as their meaning was not being debated. That a list comprehension should be used only for “new” lists is presented but not argued for.
  • thiruvenkadam
    thiruvenkadam over 9 years
    I used the definition to say that filter gives you list with same elements which are true for a case but with list comprehension we can modify the elements themselves, like converting int to str. But point taken :-)
  • viki.omega9
    viki.omega9 over 9 years
    Would you not consider using the prime via another list comprehension? Such as [prime(i) for i in [x**3 for x in range(1000)]]
  • Anton
    Anton about 9 years
    How would one go about getting [f(x) for x in X if f(x)] with one call to f(x) per element? If f(x) can return None, I would like to filter out those values in the first pass.
  • skqr
    skqr about 9 years
    Actually, no - filter is faster. Just run a couple of quick benchmarks using something like stackoverflow.com/questions/5998245/…
  • Duncan
    Duncan about 9 years
    @skqr better to just use timeit for benchmarks, but please give an example where you find filter to be faster using a Python callback function.
  • Tagar
    Tagar almost 9 years
    didn't know reduce was demoted in Python3. thanks for the insight! reduce() is still quite helpful in distributed computing, like PySpark. I think that was a mistake..
  • Alf47
    Alf47 almost 9 years
    I have a question about this generator - according to this link python.net/~goodger/projects/pycon/2007/idiomatic/handout.ht‌​ml the comprehension mentioned in the question performs much better and is "idiomatic". So shouldn't the generator not be the preferred method? (I'm new at Python)
  • Duncan
    Duncan almost 9 years
    @Alf47 nothing when programming (Python or any other language) is absolute. The link you reference points out the list comprehension is best when kept simple and if too complex you should use an explicit for loop. A full-blown generator allows you to abstract some complex looping conditions and keep the loop construct separate from the body that processes whatever it yields. At the level of the example given the list comprehension is just fine, but the generator is a useful tool to know about for when the list comprehension would be over complicated.
  • Zelphir Kaltstahl
    Zelphir Kaltstahl almost 9 years
    This achieves a lot in very little code indeed. I think it might be a bit too much logic in one line to easily understand and readability is what counts though.
  • Zelphir Kaltstahl
    Zelphir Kaltstahl almost 9 years
    x*x*x cannot be a prime number, as it has x^2 and x as a factor, the example doesn't really make sense in a mathematical way, but maybe it's still helpul. (Maybe we could find something better though?)
  • Steve Jessop
    Steve Jessop about 8 years
    "if there is a way to have the resulting list ... I am curious to know it". Just call list() on the result: list(filter(my_func, my_iterable)). And of course you could replace list with set, or tuple, or anything else that takes an iterable. But to anyone other than functional programmers, the case is even stronger to use a list comprehension rather than filter plus explicit conversion to list.
  • Steve Jessop
    Steve Jessop about 8 years
    You could write this as file_contents = list(filter(None, (s.partition('#')[0].strip() for s in lines)))
  • tnq177
    tnq177 about 8 years
    @WayneWerner do you mind share the presentation please?
  • Wayne Werner
    Wayne Werner about 8 years
    @tnq177 It's David Beasley's presentation on generators - dabeaz.com/generators
  • Mateen Ulhaq
    Mateen Ulhaq almost 8 years
    Note that we may use a generator expression instead for the last example if we don't want to eat up memory: prime_cubes = filter(prime, (x*x*x for x in range(1000)))
  • bli
    bli over 7 years
    Late comment to an often-seen argument: Sometimes it makes a difference to have an analysis run in 5 hours instead of 10, and if that can be achieved by taking one hour optimizing python code, it can be worth it (especially if one is comfortable with python and not with faster languages).
  • icc97
    icc97 over 6 years
    @Tagar you can still use reduce you just have to import it from functools
  • Dennis Krupenik
    Dennis Krupenik over 6 years
    @MateenUlhaq this can be optimized to prime_cubes = [1] to save both memory and cpu cycles ;-)
  • Mateen Ulhaq
    Mateen Ulhaq over 6 years
    @DennisKrupenik Or rather, []
  • Dennis Krupenik
    Dennis Krupenik over 6 years
    @MateenUlhaq indeed
  • François Leblanc
    François Leblanc about 6 years
    To look at it from another angle, this can also be written as [x for x in map(f, X) if P(x)] and we only apply f() once. Changing the square brackets to parentheses makes it a generator comprehension.
  • Victor Schröder
    Victor Schröder over 5 years
    "...which is where readability really matters...". Sorry, but readability always matters, even in the (rare) cases when you -- crying -- have to give up of it.
  • Victor Schröder
    Victor Schröder over 5 years
    Invalid comparison. First, you are not passing a lambda function to the filter version, which makes it default to the identity function. When defining if not None in the list comprehension you are defining a lambda function (notice the MAKE_FUNCTION statement). Second, the results are different, as the list comprehension version will remove only None value, whereas the filter version will remove all "falsy" values. Having that said, the whole purpose of microbenchmarking is useless. Those are one million iterations, times 1k items! The difference is negligible.
  • GeeTransit
    GeeTransit over 4 years
    Now that Python 3.8 is almost out, you can store the result of prime(x*x*x) in an assignment expression (walrus boi). Here: prime_cubes = [x_cubed for x in range(1000) if prime(x_cubed := x*x*x)]
  • thoni56
    thoni56 over 4 years
    But more important is how much the source code slows us down trying to read and understand it!
  • Thomas Grainger
    Thomas Grainger over 4 years
    all lists are unhashable >>> hash(list()) # TypeError: unhashable type: 'list' secondly this works fine: processed_data = [s for s in data_from_db if 'abc' in s.field1 or s.StartTime >= start_date_time]
  • polvoazul
    polvoazul over 4 years
    Yeah, you can now use warlus, but please dont, it is so unreadable it hurts a bit
  • juanpa.arrivillaga
    juanpa.arrivillaga about 4 years
    "If the list is unhashable you cannot directly process it with a list comprehension." This is not true, and all lists are unhashable anyway.
  • Sole Sensei
    Sole Sensei over 3 years
    list(filter(None, seq)) is equal to [i for i in seq if i] not i is not None. docs.python.org/3/library/functions.html#filter
  • Qian Chen
    Qian Chen over 2 years
    Basically, Pythonic way is a secret weapon that you can use when you want to say my idea is better than yours.
  • Thomas
    Thomas over 2 years
    +1 for "I would really only optimise this if it proved to be the bottleneck in your application which is unlikely." – It may be off-topic but there is so much unreadable code out there just because developers want to safe a few microseconds or 20 KB of memory. Unless the marginal higher memory consumption or the 2 or 5 microseconds are really an issue, clean code should always be preferred. (In this scenario, using filter is as much clean code as using list comprehension. Personally, I consider list comprehension more pythonic.)