How to get mean, median, and other statistics over entire matrix, array or dataframe?

85,146

Solution 1

Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean and median.

  1. For a matrix, or array, as the others have stated, mean and median will return a single value. However, var will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var goes back to returning a single value. sd on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector() first. Having fun yet?

  2. For a data.frame, mean is deprecated, but will again act on the columns separately. median requires that you coerce to a vector first, or unlist. As before, var will return the covariances, and sd is again deprecated but will return the standard deviation of the columns. mad requires that you coerce to a vector or unlist. In general for a data.frame if you want something to act on all values, you generally will just unlist it first.

Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:

o   mean() for data frames and sd() for data frames and matrices are
defunct.

Solution 2

By default, mean and median etc work over an entire array or matrix.

E.g.:

# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.

# matrix:
mean(as.matrix(m)) # same as before

For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):

# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.

Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.

Solution 3

You can use library dplyr via install.packages('dplyr') and then

dataframe.mean <- dataframe %>%
  summarise_all(mean) # replace for median
Share:
85,146
user2760
Author by

user2760

Updated on October 22, 2020

Comments

  • user2760
    user2760 over 3 years

    I know this is a basic question but for some strange reason I am unable to find an answer.

    How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns

  • smci
    smci almost 12 years
    But for dataframes, mean and median do not work as is. As you point out, coercing the df to matrix will error due to non-numeric columns (so you have to create a column index to only touch numeric columns). (Further, if the dataframe is large it isn't efficient or scalable to convert it with as.matrix(mdf) - big temporary variable).