Length of longest word in a list

python string performance list coding-style

14,600

Solution 1

I think both are OK, but I think that unless speed is a big consideration that max(len(w) for w in words) is the most readable.

When I was looking at them, it took me longer to figure out what len(max(words, key=len)) was doing, and I was still wrong until I thought about it more. Code should be immediately obvious unless there's a good reason for it not to be.

It's clear from the other posts (and my own tests) that the less readable one is faster. But it's not like either of them are dog slow. And unless the code is on a critical path it's not worth worrying about.

Ultimately, I think more readable is more Pythonic.

As an aside, this one of the few cases in which Python 2 is notably faster than Python 3 for the same task.

Solution 2

Although:

max(len(w) for w in words)

does kind of "read" easier - you've got the overhead of a generator.

While:

len(max(words, key=len))

can optimise away with the key using builtins and since len is normally a very efficient op for strings, is going to be faster...

Solution 3

If you rewrite the generator expression as a map call (or, for 2.x, imap):

max(map(len, words))

… it's actually a bit faster than the key version, not slower.

python.org 64-bit 3.3.0:

In [186]: words = ['now', 'is', 'the', 'winter', 'of', 'our', 'partyhat'] * 100
In [188]: %timeit max(len(w) for w in words)
%10000 loops, best of 3: 90.1 us per loop
In [189]: %timeit len(max(words, key=len))
10000 loops, best of 3: 57.3 us per loop
In [190]: %timeit max(map(len, words))
10000 loops, best of 3: 53.4 us per loop

Apple 64-bit 2.7.2:

In [298]: words = ['now', 'is', 'the', 'winter', 'of', 'our', 'partyhat'] * 100
In [299]: %timeit max(len(w) for w in words)
10000 loops, best of 3: 99 us per loop
In [300]: %timeit len(max(words, key=len))
10000 loops, best of 3: 64.1 us per loop
In [301]: %timeit max(map(len, words))
10000 loops, best of 3: 67 us per loop
In [303]: %timeit max(itertools.imap(len, words))
10000 loops, best of 3: 63.4 us per loop

I think it's more pythonic than the key version, for the same reason the genexp is.

It's arguable whether it's as pythonic as the genexp version. Some people love map/filter/reduce/etc.; some hate them; my personal feeling is that when you're trying to map a function that already exists and has a nice name (that is, something you don't have to lambda or partial up), map is nicer, but YMMV (especially if your name is Guido).

One last point:

the redundancy of len being called twice seems not to matter - does more happen in C code in this form?

Think about it like this: You're already calling len N times. Calling it N+1 times instead is hardly likely to make a difference, compared to anything you have to do N times, unless you have a tiny number of huge strings.

Solution 4

I'd say

len(max(x, key=len))

looks quite good because you utilize a keyword argument (key) of a built-in (max) with a built-in (len). So basically max(x, key=len) gets you almost the answer. But none of your code variants look particularly un-pythonic to me.

View more solutions

14,600

Author by

wim

Hi from Chicago! Python dev with interest in mathematics, music, robotics and computer vision. I hope my Q&A have been helpful for you. If one of my answers has saved your butt today and you would like a way to say thank you, then feel free to buy me a coffee! :-D [ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo *Click*

Updated on June 04, 2022

Comments

wim almost 2 years

What is the more pythonic way of getting the length of the longest word:

len(max(words, key=len))

Or:

max(len(w) for w in words)

Or.. something else? words is a list of strings. I am finding I need to do this often and after timing with a few different sample sizes the first way seems to be consistently faster, despite seeming less efficient at face value (the redundancy of len being called twice seems not to matter - does more happen in C code in this form?).
arshajii over 11 years

But why? Is there any reason?
miku over 11 years

@A.R.S.: Added a short, well, ... subjective reason.
Jon Clements over 11 years

That being said - I can't say which is more "Pythonic" - I like both, but for someone unfamiliar with the use of max with key perhaps the former is going to be more immediately grokkable
Omnifarious over 11 years

As the list gets longer the discrepancy gets bigger.
Omnifarious over 11 years

max(map(len, words)) is also pretty readable and obvious. So it gets my vote.
abarnert over 11 years

@Omnifarious: It's readable and obvious to me, and to you… but maybe not to everyone. I added a paragraph about that.
abarnert over 11 years

In my tests, 3.3.0 beat 2.7.2 for every version I could come up with. (See my answer for the obvious ones.)
abarnert over 11 years

Update: Actually, if I run them both in 32-bit mode, 3.3.0 is significantly slower. But then almost everything seems slow in 32-bit 3.2 or 3.3, at least on Macs, so I don't think there's anything specific to this case.
Omnifarious over 11 years

@abarnert: Interesting. I ran them both in 64-bit mode on a Linux system. One was Python 2.7.3 and the other 3.3.0. I was using /usr/share/dict/words as the word list. I was getting speeds of 88ms vs 66ms. Perhaps it's my choice of a long word list that made the difference.
abarnert over 11 years

@Omnifarous: Using 70000 words instead of 700, I get almost the exact same performance numbers multiplied by 100. Basically, 64-bit 3.3 is 8-14% faster than 64-bit 2.7, but 32-bit 3.3 is 0-10% slower than 32-bit 2.7 (and 32-bit and 64-bit 2.7 are within 2% of each other). (However, PyPy seems to do a lot better with 70000 than 700, as you might expect… I didn't include it in my answer because the machine I was testing on doesn't have ipython for pypy.)