Python : easy way to do geometric mean in python?

50,042

Solution 1

The formula of the gemetric mean is:

geometrical mean

So you can easily write an algorithm like:

import numpy as np

def geo_mean(iterable):
    a = np.array(iterable)
    return a.prod()**(1.0/len(a))

You do not have to use numpy for that, but it tends to perform operations on arrays faster than Python. See this answer for why.

In case the chances of overflow are high, you can map the numbers to a log domain first, calculate the sum of these logs, then multiply by 1/n and finally calculate the exponent, like:

import numpy as np

def geo_mean_overflow(iterable):
    return np.exp(np.log(iterable).mean())

Solution 2

In case someone is looking here for a library implementation, there is gmean() in scipy, possibly faster and numerically more stable than a custom implementation:

>>> from scipy.stats import gmean
>>> gmean([1.0, 0.00001, 10000000000.])
46.415888336127786

Compatible with both Python 2 and 3.*

Solution 3

Starting Python 3.8, the standard library comes with the geometric_mean function as part of the statistics module:

from statistics import geometric_mean

geometric_mean([1.0, 0.00001, 10000000000.]) # 46.415888336127786

Solution 4

Here's an overflow-resistant version in pure Python, basically the same as the accepted answer.

import math

def geomean(xs):
    return math.exp(math.fsum(math.log(x) for x in xs) / len(xs))

Solution 5

just do this:

numbers = [1, 3, 5, 7, 10]


print reduce(lambda x, y: x*y, numbers)**(1.0/len(numbers))
Share:
50,042
Admin
Author by

Admin

Updated on February 11, 2022

Comments

  • Admin
    Admin over 2 years

    I wonder is there any easy way to do geometric mean using python but without using python package. If there is not, is there any simple package to do geometric mean?

  • Willem Van Onsem
    Willem Van Onsem about 7 years
    Now it is correct. Note however that by using reduce(..) you will introduce some computational overhead.
  • Pablo Maurin
    Pablo Maurin about 7 years
    Good job with using logs for this. People often forget about overflow.
  • WaterRocket8236
    WaterRocket8236 about 6 years
    What actually is overflow ?
  • Willem Van Onsem
    Willem Van Onsem about 6 years
    @BhabaniMohapatra: a floating point has a fixed number of bits. Hence it can represent a fixed number of values. Overflow is a sitation in which you calculate a number that can no longer be represented. Python uses a 64-bit float, so that means the maximum value is 1.7976931348623157e+308. Although this is rather large, in case we do not work with logs, and we have for example 310 numbers that each are around 10, then overflow can already occur.
  • Willem Van Onsem
    Willem Van Onsem about 6 years
    @BhabaniMohapatra: see for example here stackoverflow.com/questions/40082459/… (this is indeed more specific to JavaScript, but this phenomena happen in all programming languages with floating points).
  • PatrickT
    PatrickT over 5 years
    Can you comment on the difference between a.sum() and sum(a) as it relates to efficiency or overlow? and why not write np.exp(a.mean()) (last line)? Thanks.
  • Willem Van Onsem
    Willem Van Onsem over 5 years
    a.sum() will perform a sum in numpy sum, which is faster than a sum in Python over iterables). As for the mean, if you do this with numpy, you get a NaN, where by using len(a) this will raise a division by 0, personally I prefer tha latter, but this is of course more a matter of "taste".
  • GratefulGuest
    GratefulGuest about 3 years
    If the array contains negative numbers then you can do the following n = len(a), m = len(a[a<0]), logs = np.log(np.abs(a)), return np.exp(np.mean(logs)) * ((-1)**m)**(1/n). This can return a complex number.
  • Greg Glockner
    Greg Glockner over 2 years
    Nice - this will work on any Python >= 3.8, including systems where it is not possible/practical to install other packages like numpy.