Average of a list of numbers, stored as strings in a Python list

10,251

Solution 1

num = ['1', '2', '', '6']
L = [int(n) for n in num if n]
ave = sum(L)/float(len(L)) if L else '-'

or

num = ['1', '2', '', '6']
L = [float(n) for n in num if n]
avg = sum(L)/len(L) if L else '-'

Solution 2

In Python 3.4 use the statistics library:

from statistics import mean
num = ['1', '2', '', '6']
ave = mean(int(n) for n in num if n)

Solution 3

You can discard the square brackets. sum accepts generator expressions, too:

total  = sum(int(n) if n else 0 for n in num)
length = sum(1 if n else 0 for n in num)

And since generators yields the value only when needed, you save the expensive cost of storing a list in the memory. Especially if you're dealing with bigger datas.

Solution 4

Here's some timing on OP's solution vs. aIKid's solution vs. gnibbler's solutions, using a list of 100,000 numbers in 1..9 (plus the empty string) and 10 trials:

import timeit

setup = '''
from __main__ import f1, f2, f3, f4
import random


random.seed(0)
choices = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '']
num = [random.choice(choices) for _ in range(10**5)]
'''

def f1(num): # OP
    total  = sum([int(n) if n else 0 for n in num])
    length = sum([1 if n else 0 for n in num])
    ave    = float(total)/length if length > 0 else '-'
    return ave

def f2(num): # aIKid
    total = sum(int(n) if n else 0 for n in num)
    length = sum(1 if n else 0 for n in num)
    ave = float(total)/length if length > 0 else '-'
    return ave

def f3(num): # gnibbler 1
    L = [int(n) for n in num if n]
    ave = sum(L)/float(len(L)) if L else '-'
    return ave

def f4(num): # gnibbler 2
    L = [float(n) for n in num if n]
    ave = sum(L)/float(len(L)) if L else '-'
    return ave

number = 10
things = ['f1(num)', 'f2(num)', 'f3(num)', 'f4(num)']
for thing in things:
    print(thing, timeit.timeit(thing, setup=setup, number=number))

Result:

f1(num) 1.8177659461490339 # OP
f2(num) 2.0769015213241513 # aIKid
f3(num) 1.6350571199344595 # gnibbler 1
f4(num) 0.807052779158564  # gnibbler 2

It looks like gnibbler's solution using float is the fastest here.

Share:
10,251
user
Author by

user

Updated on June 04, 2022

Comments

  • user
    user almost 2 years

    I want to calculate the average value of several lists in python. These lists contain numbers as strings. Empty string isn't zero, it means a missing value.

    The best I could come up with is this. Is there a more elegant, succinct & efficient way of writing this?

    num    = ['1', '2', '', '6']
    total  = sum([int(n) if n else 0 for n in num])
    length = sum([1 if n else 0 for n in num])
    ave    = float(total)/length if length > 0 else '-'
    

    P.S. I'm using Python 2.7.x but recipes for Python 3.x are welcome

  • user
    user over 10 years
    Is that more efficient?
  • aIKid
    aIKid over 10 years
    Far more efficient when you're dealing with huge lists.
  • user
    user over 10 years
    So it's less memory intensive but not really faster, right? I feel since the list comprehension is being done twice, it's better to use them since list comprehension will stay in memory.
  • aIKid
    aIKid over 10 years
    @buffer Why? That means you're storing two different list in memory.
  • aIKid
    aIKid over 10 years
    Speed-Wise, i think so. The cost of creating a generator is generally more expensive than creating a list. Gen expressions would be more useful if you're handling bigger lists, as i mentioned in my answer.
  • user
    user over 10 years
    See this stackoverflow.com/questions/47789/… & answer from gnibbler + senshin
  • Jon Clements
    Jon Clements over 10 years
    I'd go for sum(L, 0.0) / len(L)
  • senshin
    senshin over 10 years
    @alKid What counts as big here (order-of-magnitude-wise)?
  • John La Rooy
    John La Rooy over 10 years
    @JonClements, might as well just add them up as floats in the first place
  • user
    user over 10 years
    Adding floats would be more expensive
  • user
    user over 10 years
    Good to know, I should have mentioned I'm using Python 2.7.x
  • John La Rooy
    John La Rooy over 10 years
    @buffer, ok I put both versions
  • user
    user over 10 years
    Great comparison. If you add 'f4' with summation as float, it's surprisingly the fastest