How to calculate mean, median, mode and range from a set of numbers

290,339

Solution 1

Yes, there does seem to be 3rd libraries (none in Java Math). Two that have come up are:

http://opsresearch.com/app/

http://www.iro.umontreal.ca/~simardr/ssj/indexe.html

but, it is actually not that difficult to write your own methods to calculate mean, median, mode and range.

MEAN

public static double mean(double[] m) {
    double sum = 0;
    for (int i = 0; i < m.length; i++) {
        sum += m[i];
    }
    return sum / m.length;
}

MEDIAN

// the array double[] m MUST BE SORTED
public static double median(double[] m) {
    int middle = m.length/2;
    if (m.length%2 == 1) {
        return m[middle];
    } else {
        return (m[middle-1] + m[middle]) / 2.0;
    }
}

MODE

public static int mode(int a[]) {
    int maxValue, maxCount;

    for (int i = 0; i < a.length; ++i) {
        int count = 0;
        for (int j = 0; j < a.length; ++j) {
            if (a[j] == a[i]) ++count;
        }
        if (count > maxCount) {
            maxCount = count;
            maxValue = a[i];
        }
    }

    return maxValue;
}

UPDATE

As has been pointed out by Neelesh Salpe, the above does not cater for multi-modal collections. We can fix this quite easily:

public static List<Integer> mode(final int[] numbers) {
    final List<Integer> modes = new ArrayList<Integer>();
    final Map<Integer, Integer> countMap = new HashMap<Integer, Integer>();

    int max = -1;

    for (final int n : numbers) {
        int count = 0;

        if (countMap.containsKey(n)) {
            count = countMap.get(n) + 1;
        } else {
            count = 1;
        }

        countMap.put(n, count);

        if (count > max) {
            max = count;
        }
    }

    for (final Map.Entry<Integer, Integer> tuple : countMap.entrySet()) {
        if (tuple.getValue() == max) {
            modes.add(tuple.getKey());
        }
    }

    return modes;
}

ADDITION

If you are using Java 8 or higher, you can also determine the modes like this:

public static List<Integer> getModes(final List<Integer> numbers) {
    final Map<Integer, Long> countFrequencies = numbers.stream()
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    final long maxFrequency = countFrequencies.values().stream()
            .mapToLong(count -> count)
            .max().orElse(-1);

    return countFrequencies.entrySet().stream()
            .filter(tuple -> tuple.getValue() == maxFrequency)
            .map(Map.Entry::getKey)
            .collect(Collectors.toList());
}

Solution 2

Check out commons math from apache. There is quite a lot there.

Solution 3

    public static Set<Double> getMode(double[] data) {
            if (data.length == 0) {
                return new TreeSet<>();
            }
            TreeMap<Double, Integer> map = new TreeMap<>(); //Map Keys are array values and Map Values are how many times each key appears in the array
            for (int index = 0; index != data.length; ++index) {
                double value = data[index];
                if (!map.containsKey(value)) {
                    map.put(value, 1); //first time, put one
                }
                else {
                    map.put(value, map.get(value) + 1); //seen it again increment count
                }
            }
            Set<Double> modes = new TreeSet<>(); //result set of modes, min to max sorted
            int maxCount = 1;
            Iterator<Integer> modeApperance = map.values().iterator();
            while (modeApperance.hasNext()) {
                maxCount = Math.max(maxCount, modeApperance.next()); //go through all the value counts
            }
            for (double key : map.keySet()) {
                if (map.get(key) == maxCount) { //if this key's value is max
                    modes.add(key); //get it
                }
            }
            return modes;
        }

        //std dev function for good measure
        public static double getStandardDeviation(double[] data) {
            final double mean = getMean(data);
            double sum = 0;
            for (int index = 0; index != data.length; ++index) {
                sum += Math.pow(Math.abs(mean - data[index]), 2);
            }
            return Math.sqrt(sum / data.length);
        }


        public static double getMean(double[] data) {
        if (data.length == 0) {
            return 0;
        }
        double sum = 0.0;
        for (int index = 0; index != data.length; ++index) {
            sum += data[index];
        }
        return sum / data.length;
    }

//by creating a copy array and sorting it, this function can take any data.
    public static double getMedian(double[] data) {
        double[] copy = Arrays.copyOf(data, data.length);
        Arrays.sort(copy);
        return (copy.length % 2 != 0) ? copy[copy.length / 2] : (copy[copy.length / 2] + copy[(copy.length / 2) - 1]) / 2;
    }

Solution 4

If you only care about unimodal distributions, consider sth. like this.

public static Optional<Integer> mode(Stream<Integer> stream) {
    Map<Integer, Long> frequencies = stream
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    return frequencies.entrySet().stream()
        .max(Comparator.comparingLong(Map.Entry::getValue))
        .map(Map.Entry::getKey);
}
Share:
290,339
user339108
Author by

user339108

Updated on July 16, 2020

Comments

  • user339108
    user339108 almost 4 years

    Are there any functions (as part of a math library) which will calculate mean, median, mode and range from a set of numbers.

  • user339108
    user339108 over 13 years
    thanks, but I would prefer to use something out of the box if possible
  • duffymo
    duffymo over 13 years
    This class will have issues if you have a very large array or have to calculate values on the fly. It can be written without an array for mean and standard deviation; not as certain for median and mode.
  • Fernando Ghisi
    Fernando Ghisi over 13 years
    The MODE algorithm is not considering cases with more than one mode (bimodal, trimodal, ...) - it happens when there is more than one number appearing in the same number of times as maxCount. Considering this, it should return an array instead of a single int value.
  • Chinasaur
    Chinasaur almost 12 years
    As mentioned in my comment on Adeel's answer, sorting the whole array to get the median is pretty inefficient.
  • Chinasaur
    Chinasaur almost 12 years
    See comment on Adeel's answer: Apache Commons Math appears to use a pretty inefficient median algorithm.
  • Nico Huysamen
    Nico Huysamen over 10 years
    @NeeleshSalpe - Thanks for pointing that out. Updated my answer.
  • Sascha Vetter
    Sascha Vetter almost 10 years
    median function throw an ArrayIndexOutOfBoundsException if array has one entry only
  • pedram bashiri
    pedram bashiri about 7 years
    Add "Arrays.sort(m);" to beginning of your median method, so it doesn't require a sorted array.