Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package
Solution 1
Solution
library(stargazer)
library(dplyr)
library(tidyr)
ToothGrowth %>%
group_by(supp) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
gather(temp, val, len, dose) %>%
unite(temp1, supp, temp, sep = '_') %>%
spread(temp1, val) %>%
select(-id) %>%
as.data.frame() %>%
stargazer(type = 'text')
Result
=========================================
Statistic N Mean St. Dev. Min Max
-----------------------------------------
OJ_dose 30 1.167 0.634 0.500 2.000
OJ_len 30 20.663 6.606 8.200 30.900
VC_dose 30 1.167 0.634 0.500 2.000
VC_len 30 16.963 8.266 4.200 33.900
-----------------------------------------
Explanation
This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer
was to create a new data frame that had variables for each group's observations using a gather()
, unite()
, spread()
strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer()
.
Solution 2
Three possible solution. One using reporttools and xtable, one using tidyverse tools along with stargazer, and third a base-r solution.
First,
I want to suggest you take a look at reporttools which is kinda leaving stargazer, but I think you should take a look at it,
# install.packages("reporttools") #Use this to install it, do this only once
require(reporttools)
vars <- ToothGrowth[,c('len','dose')]
group <- ToothGrowth[,c('supp')]
## display default statistics, only use a subset of observations, grouped analysis
tableContinuous(vars = vars, group = group, prec = 1, cap = "Table of 'len','dose' by 'supp' ", lab = "tab: descr stat")
% latex table generated in R 3.3.3 by xtable 1.8-2 package
\begingroup\footnotesize
\begin{longtable}{llrrrrrrrrrr}
\textbf{Variable} & \textbf{Levels} & $\mathbf{n}$ & \textbf{Min} & $\mathbf{q_1}$ & $\mathbf{\widetilde{x}}$ & $\mathbf{\bar{x}}$ & $\mathbf{q_3}$ & \textbf{Max} & $\mathbf{s}$ & \textbf{IQR} & \textbf{\#NA} \\
\hline
len & OJ & 30 & 8.2 & 15.5 & 22.7 & 20.7 & 25.7 & 30.9 & 6.6 & 10.2 & 0 \\
& VC & 30 & 4.2 & 11.2 & 16.5 & 17.0 & 23.1 & 33.9 & 8.3 & 11.9 & 0 \\
\hline
& all & 60 & 4.2 & 13.1 & 19.2 & 18.8 & 25.3 & 33.9 & 7.6 & 12.2 & 0 \\
\hline
dose & OJ & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
& VC & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
\hline
& all & 60 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
\hline
\hline
\caption{Table of 'len','dose' by 'supp' }
\label{tab: descr stat}
\end{longtable}
\endgroup
in latex you get this nice result,
Second,
using tidyverse tools along with stargazer, inspired by this SO answer,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(dplyr); library(purrr)
#> ToothGrowth %>% split(. $supp) %>% walk(~ stargazer(., type = "text"))
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#>
Third,
an exclusive base-r
by(ToothGrowth, ToothGrowth$supp, stargazer, type = 'text')
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#>
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#> ToothGrowth$supp: OJ
#> [1] ""
#> [2] "========================================="
#> [3] "Statistic N Mean St. Dev. Min Max "
#> [4] "-----------------------------------------"
#> [5] "len 30 20.663 6.606 8.200 30.900"
#> [6] "dose 30 1.167 0.634 0.500 2.000 "
#> [7] "-----------------------------------------"
#> ---------------------------------------------------------------
#> ToothGrowth$supp: VC
#> [1] ""
#> [2] "========================================="
#> [3] "Statistic N Mean St. Dev. Min Max "
#> [4] "-----------------------------------------"
#> [5] "len 30 16.963 8.266 4.200 33.900"
#> [6] "dose 30 1.167 0.634 0.500 2.000 "
#> [7] "-----------------------------------------"
Solution 3
invisible(lapply(levels(ToothGrowth$supp),stargazer))
would do, but if you want separate \subsection{} in between, you probable should use something like
invisible(lapply(levels(ToothGrowth$supp),function(sg){
cat("\\subsection{add your text here}\n")
print(stargazer(sg)
})
Related videos on Youtube
Michael
Updated on June 04, 2022Comments
-
Michael almost 2 years
I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.
For example
library(stargazer) stargazer(ToothGrowth, type = "text") #> #> ========================================= #> Statistic N Mean St. Dev. Min Max #> ----------------------------------------- #> len 60 18.813 7.649 4.200 33.900 #> dose 60 1.167 0.629 0.500 2.000 #> -----------------------------------------
provides summery statistics for the continues variables in
ToothGrowth
. I would like to split that summery by the categorical variablesupp
, also inToothGrowth
.Two suggestions for desired outcome,
stargazer(ToothGrowth ~ supp, type = "text") #> #> ================================================== #> Statistic N Mean St. Dev. Min Max #> -------------------------------------------------- #> OJ len 30 16.963 8.266 4.200 33.900 #> dose 30 1.167 0.634 0.500 2.000 #> VC len 30 20.663 6.606 8.200 30.900 #> dose 30 1.167 0.634 0.500 2.000 #> -------------------------------------------------- #> stargazer(ToothGrowth ~ supp, type = "text") #> #> ================================================== #> Statistic N Mean St. Dev. Min Max #> -------------------------------------------------- #> len #> _by VC 30 16.963 8.266 4.200 33.900 #> _by VC 30 1.167 0.634 0.500 2.000 #> _tot 60 18.813 7.649 4.200 33.900 #> #> dose #> _by OJ 30 20.663 6.606 8.200 30.900 #> _by OJ 30 1.167 0.634 0.500 2.000 #> _tot 60 1.167 0.629 0.500 2.000 #> --------------------------------------------------
-
Michael over 9 yearsDamnit, I just Googled "stargazer categorical variable summary" and this was the first hit.
-
Eric Fail over 6 yearsI appreciate your question and ran a bounty on it to get it more attention. I wondered if you found a good solution yourself and/or if any of the current replies answers your question?
-
-
Michael over 9 yearshmmmm....Thanks! What if I want to change the various possible arguments within the stargazer() function?
-
Dieter Menne over 9 yearsJust add it to the stargazer(sg) call.
-
Michael over 9 yearsThank you for your help. What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables. Not sure if stargazer has that capability or not.
-
Eric Fail over 6 years@DieterMenne, I like the simplicity if your solutions. I wondered if you would be interested in adding to your answer based on Michael's comment above?