Converting factors to numeric values in R
Solution 1
For converting the currency
# data
df <- data.frame(sal = c("$100,001 - $150,000" , "over $150,000" ,
"$25,000"), educ = c("High School Diploma", "Current Undergraduate",
"PhD"),stringsAsFactors=FALSE)
# Remove comma and dollar sign
temp <- gsub("[,$]","", df$sal)
# remove text
temp <- gsub("[[:alpha:]]","", temp)
# get average over range
df$ave.sal <- sapply(strsplit(temp , "-") , function(i) mean(as.numeric(i)))
For your education levels - if you want it numeric
df$educ.f <- as.numeric(factor(df$educ , levels=c("High School Diploma" ,
"Current Undergraduate", "PhD")))
df
# sal educ ave.sal educ.f
# 1 $100,001 - $150,000 High School Diploma 125000.5 1
# 2 over $150,000 Current Undergraduate 150000.0 2
# 3 $25,000 PhD 25000.0 3
EDIT
Having missing / NA values should not matter
# Data that includes missing values
df <- data.frame(sal = c("$100,001 - $150,000" , "over $150,000" ,
"$25,000" , NA), educ = c(NA, "High School Diploma",
"Current Undergraduate", "PhD"),stringsAsFactors=FALSE)
Rerun the above commands to get
df
# sal educ ave.sal educ.f
# 1 $100,001 - $150,000 <NA> 125000.5 NA
# 2 over $150,000 High School Diploma 150000.0 1
# 3 $25,000 Current Undergraduate 25000.0 2
# 4 <NA> PhD NA 3
Solution 2
You could use the recode function in the car
package.
For example:
library(car)
df$salary <- recode(df$salary,
"'$100,001 - $150,000'=125000;'$150,000'=150000")
For more information on how to use this function see the help file.
orome
"I mingle the probable with the necessary and draw a plausible conclusion from the mixture."
Updated on July 13, 2022Comments
-
orome almost 2 years
I have factors in R that are salary ranges of the form
$100,001 - $150,000
,over $150,000
,$25,000
, etc. and would like to convert these to numeric values (e.g. converting the factor$100,001 - $150,000
to the integer 125000).Similarly I have educational categories such as
High School Diploma
,Current Undergraduate
,PhD
, etc. that I would like to assign numbers to (e.g., givingPhD
a higher value thanHigh School Diploma
).How do I do this, given the dataframe containing these values?