Add column which contains binned values of a numeric column

24,311

Solution 1

See ?cut and specify breaks (and maybe labels).

x$bins <- cut(x$rank, breaks=c(0,4,10,15), labels=c("1-4","5-10","10-15"))
x
#   rank  name   info  bins
# 1    1 steve    red   1-4
# 2    3   joe   blue   1-4
# 3    6  john  green  5-10
# 4    3   liz yellow   1-4
# 5   15   jon   pink 10-15

Solution 2

dat <- "rank,name,info
1,steve,red
3,joe,blue
6,john,green
3,liz,yellow
15,jon,pink"

x <- read.table(textConnection(dat), header=TRUE, sep=",", stringsAsFactors=FALSE)
x$bins <- cut(x$rank, breaks=seq(0, 20, 5), labels=c("1-5", "6-10", "11-15", "16-20"))
x

  rank  name   info  bins
1    1 steve    red   1-5
2    3   joe   blue   1-5
3    6  john  green  6-10
4    3   liz yellow   1-5
5   15   jon   pink 11-15

Solution 3

We can use smart_cut from package cutr :

# devtools::install_github("moodymudskipper/cutr")
library(cutr)

Using @Andrie's sample data:

x$bins <- smart_cut(x$rank,
                    c(1,5,11,16), 
                    labels = ~paste0(.y[1],'-',.y[2]-1), 
                    simplify = FALSE)
# rank  name   info  bins
# 1    1 steve    red   1-4
# 2    3   joe   blue   1-4
# 3    6  john  green  5-10
# 4    3   liz yellow   1-4
# 5   15   jon   pink 11-15

more on cutr and smart_cut

Share:
24,311
wespiserA
Author by

wespiserA

interesting article :

Updated on January 06, 2022

Comments

  • wespiserA
    wespiserA over 2 years

    I have a dataframe with a few columns, one of those columns is ranks, an integer between 1 and 20. I want to create another column that contains a bin value like "1-4", "5-10", "11-15", "16-20".

    What is the most effective way to do this?

    the data frame that I have looks like this(.csv format):

    rank,name,info
    1,steve,red
    3,joe,blue
    6,john,green
    3,liz,yellow
    15,jon,pink
    

    and I want to add another column to the dataframe, so it would be like this:

    rank,name,info,binValue
    1,steve,red,"1-4"
    3,joe,blue,"1-4"
    6,john,green, "5-10"
    3,liz,yellow,"1-4"
    15,jon,pink,"11-15"
    

    The way I am doing it now is not working, as I would like to keep the data.frame intact, and just add another column if the value of df$ranked is within a given range. thank you.