Sort a dataframe column by the frequency of occurrence
Solution 1
One possible approach is to use data.table
to add freq
column, then sort your data accordingly:
library(data.table)
setDT(df)[,freq := .N, by = c("Region","Salary")]
# Sort
df[order(freq, decreasing = T),]
# As a oneliner (thx @Jaap)
setDT(df)[, freq := .N, by = .(Region,Salary)][order(-freq)]
Solution 2
Base R
df <- transform(df, freq= ave(seq(nrow(df)), Salary, FUN=length))
df[order(-df$freq), ]
Output:
Region ID Salary freq
6 2 A6 1002 3
7 2 A7 1002 3
8 2 A8 1002 3
2 1 A2 1001 2
5 1 A5 1001 2
9 3 A9 3001 2
10 3 A10 3001 2
1 1 A1 100 1
3 1 A3 2000 1
4 1 A4 2431 1
11 3 A11 4001 1
dplyr
library(dplyr)
df %>%
add_count(Salary) %>%
arrange(desc(n))
Output:
Region ID Salary n
1 2 A6 1002 3
2 2 A7 1002 3
3 2 A8 1002 3
4 1 A2 1001 2
5 1 A5 1001 2
6 3 A9 3001 2
7 3 A10 3001 2
8 1 A1 100 1
9 1 A3 2000 1
10 1 A4 2431 1
11 3 A11 4001 1
Solution 3
library(dplyr)
data %>% group_by(Region, Salary) %>% summarise(n=n()) %>% ungroup() %>% arrange(-n)
(edited: incloude commentary on the ungroup
from Jaap, good point)
Anubhav Dikshit
Updated on June 05, 2022Comments
-
Anubhav Dikshit almost 2 years
I have a dataframe in called df, there are three column lets say,
Region ID Salary 1 A1 100 1 A2 1001 1 A3 2000 1 A4 2431 1 A5 1001 .............. .............. 2 A6 1002 2 A7 1002 2 A8 1002 3 A9 3001 3 A10 3001 3 A11 4001
Now I want to sort column Salary by the occurrence of them by Region, that is using frequency table or something, get the probability of occurrence per region and sort them. Please assume that the dataset is large enough (1000 rows)
P.S: Can anyone suggest a good method to do some. Please use column name in your answers since the real table has some column in the middle
Thanks in advance
**EDIT 1**
I think I was not clear enough, thanks for all those who replied, I sincerely apologise for not being clear:
With the current dataset we need to create a frequency table say:
Region Salary(bin) Count 1 1K 6 1 5K 3 1 2K 2 1 15K 2 1 0.5K 2 1 24K 1 1 0K 0
using this we can classify add a new columns in our data frame df called bin(bucket from histogram)
Region ID Salary (bin) Count 1 A1 100 1K 6 1 A2 1001 2K 2 1 A3 2000 2K 2 1 A4 2431 5K 3
..........................So on...............
We can do the above using:
df$bin <- cut(df$salary, breaks=hist(df$salary)$breaks)
After sorting by Region and Count and Salary we get:
Region ID Salary (bin) Count 1 A1 100 1K 6 1 A4 2431 5K 3 1 A3 2000 2K 2 1 A2 1001 2K 2
As you can see, we need to create frequency table for each region and do sort. I did the above using Tableau but I want to automate this in R
Hope I was clear
-
Anubhav Dikshit over 8 yearsDoes not this just sort by the numerical value instead of the frequency of occurance?
-
mtoto over 8 yearsyou can still use the below approaches as far as I can see.
-
Anubhav Dikshit over 8 yearsYes, but I agree ( hence the upvotes), was hoping for a more elegant solution than mine
-
-
Jaap over 8 yearseven shorter:
setDT(df)[, .N, by = .(Region,Salary)][order(-N)]
-
Jaap over 8 yearsor:
df %>% group_by(Region, Salary) %>% summarise(n=n()) %>% ungroup() %>% arrange(-n)
-
Jaap over 8 yearsif you want to retain the whole dataframe, you will need to add in th
freq :=
again -
Jaap over 8 yearsif you want to retain the whole dataframe, then you need to use
mutate
instead ofsummarise