How to retrieve the most repeated value in a column present in a data frame
40,563
Solution 1
tail(names(sort(table(Forbes2000$category))), 1)
Solution 2
In case two or more categories may be tied for most frequent, use something like this:
x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
tt <- table(x)
names(tt[tt==max(tt)])
[1] "Food markets" "Insurance"
Solution 3
Another way with the data.table package, which is faster for large data sets:
set.seed(1)
x=sample(seq(1,100), 5000000, replace = TRUE)
method 1 (solution proposed above)
start.time <- Sys.time()
tt <- table(x)
names(tt[tt==max(tt)])
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 4.883488 secs
method 2 (DATA TABLE)
start.time <- Sys.time()
ds <- data.table( x )
setkey(ds, x)
sorted <- ds[,.N,by=list(x)]
most_repeated_value <- sorted[order(-N)]$x[1]
most_repeated_value
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 0.328033 secs
Solution 4
you can create a function:
get_mode <- function(x){
return(names(sort(table(x), decreasing = T, na.last = T)[1]))
}
and then do
get_mode(Forbes3000$category)
The reason I created a function is that I have to this kind of thing very often.
Solution 5
Using the function option from @Malvika makes it easy to apply across a table and get these values for every column
#create a mode function
get_mode_name <- function(x){
return(names(sort(table(x), decreasing = T, na.last = T)[1]))
}
get_mode_value <- function(x){
return(unname(sort(table(x), decreasing = T, na.last = T)[1]))
}
get_mode_pct<- function(x){
return(unname(sort(table(x), decreasing = T, na.last = T)[1])/length(x))
}
#Identify character columns
type_table <- sapply(table_name, class)
#create vector numeric and character types
num_table <- (unname(type_table) == "numeric")
char_table <- (unname(type_table) == "character")
#View the modes of character columns
mode_name <- apply(table_name[,char_table], 2, function(x) get_mode_name(x))
mode_value <- apply(table_name[,char_table], 2, function(x) get_mode_value(x))
mode_pct <- apply(table_name[,char_table], 2, function(x) get_mode_pct(x))
Author by
Teja
Updated on July 28, 2021Comments
-
Teja almost 3 years
I am trying to retrieve the most repeated value in a particular column present in a data frame.Here is my sample data and code below.A
data("Forbes2000", package = "HSAUR") head(Forbes2000) rank name country category sales profits assets marketvalue 1 1 Citigroup United States Banking 94.71 17.85 1264.03 255.30 2 2 General Electric United States Conglomerates 134.19 15.59 626.93 328.54 3 3 American Intl Group United States Insurance 76.66 6.46 647.66 194.87 4 4 ExxonMobil United States Oil & gas operations 222.88 20.96 166.99 277.02 5 5 BP United Kingdom Oil & gas operations 232.57 10.27 177.57 173.54 6 6 Bank of America United States Banking 49.01 10.81 736.45 117.55
As per my sample data I need to return the most repeated category which is Insurance.
subset(subset(Forbes2000,country=="Bermuda")
-
Arun almost 10 yearstucson, nice. I think
as.data.table(ds)[, .N, by=x][, x[N == max(N)]]
also does the job, which takes 0.06s on my laptop. As a FYI, no need tosetkey
for aggregations. -
Timothée HENRY almost 10 years@Arun Thank you. Your solution should be on top of this page.
-
Tyler Knight almost 3 yearsI like this solution, makes it easy to apply across a data frame and calculate this quickly for all values.