How to retrieve the most repeated value in a column present in a data frame

40,563

Solution 1

tail(names(sort(table(Forbes2000$category))), 1)

Solution 2

In case two or more categories may be tied for most frequent, use something like this:

x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
tt <- table(x)
names(tt[tt==max(tt)])
[1] "Food markets" "Insurance" 

Solution 3

Another way with the data.table package, which is faster for large data sets:

set.seed(1)
x=sample(seq(1,100), 5000000, replace = TRUE)

method 1 (solution proposed above)

start.time <- Sys.time()
tt <- table(x)
names(tt[tt==max(tt)])
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Time difference of 4.883488 secs

method 2 (DATA TABLE)

start.time <- Sys.time()
ds <- data.table( x )
setkey(ds, x)
sorted <- ds[,.N,by=list(x)]

most_repeated_value <- sorted[order(-N)]$x[1]
most_repeated_value

end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Time difference of 0.328033 secs

Solution 4

you can create a function:

get_mode <- function(x){
  return(names(sort(table(x), decreasing = T, na.last = T)[1]))
}

and then do

get_mode(Forbes3000$category)

The reason I created a function is that I have to this kind of thing very often.

Solution 5

Using the function option from @Malvika makes it easy to apply across a table and get these values for every column

#create a mode function
get_mode_name <- function(x){
  return(names(sort(table(x), decreasing = T, na.last = T)[1]))
}

get_mode_value <- function(x){
  return(unname(sort(table(x), decreasing = T, na.last = T)[1]))
}

get_mode_pct<- function(x){
  return(unname(sort(table(x), decreasing = T, na.last = T)[1])/length(x))
}

#Identify character columns
type_table <- sapply(table_name, class)

#create vector numeric and character types
num_table <- (unname(type_table) == "numeric")
char_table <- (unname(type_table) == "character")

#View the modes of character columns
mode_name <- apply(table_name[,char_table], 2, function(x) get_mode_name(x))    
mode_value <- apply(table_name[,char_table], 2, function(x) get_mode_value(x))
mode_pct <- apply(table_name[,char_table], 2, function(x) get_mode_pct(x))
Share:
40,563
Teja
Author by

Teja

Updated on July 28, 2021

Comments

  • Teja
    Teja almost 3 years

    I am trying to retrieve the most repeated value in a particular column present in a data frame.Here is my sample data and code below.A

    data("Forbes2000", package = "HSAUR")
    head(Forbes2000)
    
    
      rank                name        country             category  sales profits  assets marketvalue
    1    1           Citigroup  United States              Banking  94.71   17.85 1264.03      255.30
    2    2    General Electric  United States        Conglomerates 134.19   15.59  626.93      328.54
    3    3 American Intl Group  United States            Insurance  76.66    6.46  647.66      194.87
    4    4          ExxonMobil  United States Oil & gas operations 222.88   20.96  166.99      277.02
    5    5                  BP United Kingdom Oil & gas operations 232.57   10.27  177.57      173.54
    6    6     Bank of America  United States              Banking  49.01   10.81  736.45      117.55
    

    As per my sample data I need to return the most repeated category which is Insurance.

    subset(subset(Forbes2000,country=="Bermuda")
    
  • Arun
    Arun almost 10 years
    tucson, nice. I think as.data.table(ds)[, .N, by=x][, x[N == max(N)]] also does the job, which takes 0.06s on my laptop. As a FYI, no need to setkey for aggregations.
  • Timothée HENRY
    Timothée HENRY almost 10 years
    @Arun Thank you. Your solution should be on top of this page.
  • Tyler Knight
    Tyler Knight almost 3 years
    I like this solution, makes it easy to apply across a data frame and calculate this quickly for all values.