How can I obtain the rsquare out of an anova in R
Solution 1
tl;dr: you can get the R-squared of the anova by looking at the summary output of the corresponding linear model
Let's go step by step:
1) Let's use the data from here
pain <- c(4, 5, 4, 3, 2, 4, 3, 4, 4, 6, 8, 4, 5, 4, 6, 5, 8, 6, 6, 7, 6, 6, 7, 5, 6, 5, 5)
drug <- c(rep("A", 9), rep("B", 9), rep("C", 9))
migraine <- data.frame(pain, drug)
2) Let's get the anova:
AOV <- aov(pain ~ drug, data=migraine)
summary(AOV)
## Df Sum Sq Mean Sq F value Pr(>F)
## drug 2 28.22 14.111 11.91 0.000256 ***
## Residuals 24 28.44 1.185
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3) Now, the anova is directly related to the linear model, so let's get it and find the anova from it:
LM <- lm(pain ~ drug, data=migraine)
anova(LM)
## Analysis of Variance Table
##
## Response: pain
## Df Sum Sq Mean Sq F value Pr(>F)
## drug 2 28.222 14.1111 11.906 0.0002559 ***
## Residuals 24 28.444 1.1852
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As expected, the results are exactly the same. This means that...
3) We can get the R-squared from the linear model:
summary(LM)
## Call:
## lm(formula = pain ~ drug, data = migraine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7778 -0.7778 0.1111 0.3333 2.2222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6667 0.3629 10.104 4.01e-10 ***
## drugB 2.1111 0.5132 4.114 0.000395 ***
## drugC 2.2222 0.5132 4.330 0.000228 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Residual standard error: 1.089 on 24 degrees of freedom
## Multiple R-squared: 0.498, Adjusted R-squared: 0.4562
## F-statistic: 11.91 on 2 and 24 DF, p-value: 0.0002559
So the R-squared is 0.498
But what if we don't believe this?
4) What is the R-squared? It's the sum of squares regression divided by the total sum of squares (i.e., the sum of squares of the regression plus the sum of squares of the residuals). So let's find those numbers in the anova and calculate the R-squared directly:
# We use the tidy function from the broom package to extract values
library(broom)
tidy_aov <- tidy(AOV)
tidy_aov
## term df sumsq meansq statistic p.value
## 1 drug 2 28.22222 14.111111 11.90625 0.0002558807
## 2 Residuals 24 28.44444 1.185185 NA NA
# The values we need are in the sumsq column of this data frame
sum_squares_regression <- tidy_aov$sumsq[1]
sum_squares_residuals <- tidy_aov$sumsq[2]
R_squared <- sum_squares_regression /
(sum_squares_regression + sum_squares_residuals)
R_squared
## 0.4980392
So we get the same result: R-squared is 0.4980392
Solution 2
If you want to calculate the Adjusted R-square then you can apply the following formula (from https://www.statisticshowto.datasciencecentral.com/adjusted-r2/):
s <- summary(LM)
r2 <- s$r.squared
n <- dim(migraine)[1]
k <- 2
#adjusted R-square
1 - ((1-r2)*(n-1)/(n-k-1))
#the same as
s$adj.r.squared
Adjustment means penalization for additional variables ('k' in formula) just like in case of the AIC calculation. If the goodness-of-fit, the estimations vs residuals ratio does not increase significantly by adding a new independent variable then you shouldn't include it.
So, R-square will always increase by involving more and more variables while Adjusted R-square will stop improving after a certain number of regressors.
Related videos on Youtube
jakzr
I have a degree in ecological modelling and I am right now pursuing a master degree in computing to expand my skills. I work on java, C, Oracle , R and more.. I also play blues!
Updated on June 04, 2022Comments
-
jakzr almost 2 years
I'm looking for the method/function that returns de Rsquared of an anova model in R.
Could not find anything so far.
Thanks