Converting factors to numeric values in R

35,972

Solution 1

For converting the currency

# data
df <- data.frame(sal = c("$100,001 - $150,000" , "over $150,000" , 
    "$25,000"), educ = c("High School Diploma", "Current Undergraduate",
   "PhD"),stringsAsFactors=FALSE)

 # Remove comma and dollar sign
temp <- gsub("[,$]","", df$sal)

# remove text
temp <- gsub("[[:alpha:]]","", temp)

# get average over range
df$ave.sal <- sapply(strsplit(temp , "-") , function(i) mean(as.numeric(i)))


For your education levels - if you want it numeric

df$educ.f <- as.numeric(factor(df$educ , levels=c("High School Diploma" ,
          "Current Undergraduate", "PhD")))


df
#                  sal                  educ  ave.sal educ.f
# 1 $100,001 - $150,000   High School Diploma 125000.5      1
# 2       over $150,000 Current Undergraduate 150000.0      2
# 3             $25,000                   PhD  25000.0      3



EDIT

Having missing / NA values should not matter

# Data that includes missing values

df <- data.frame(sal = c("$100,001 - $150,000" , "over $150,000" , 
                 "$25,000" , NA), educ = c(NA, "High School Diploma", 
"Current Undergraduate", "PhD"),stringsAsFactors=FALSE)

Rerun the above commands to get

df
 #                 sal                  educ  ave.sal educ.f
# 1 $100,001 - $150,000                  <NA> 125000.5     NA
# 2       over $150,000   High School Diploma 150000.0      1
# 3             $25,000 Current Undergraduate  25000.0      2
# 4                <NA>                   PhD       NA      3

Solution 2

You could use the recode function in the car package.

For example:

library(car)
df$salary <- recode(df$salary, 
    "'$100,001 - $150,000'=125000;'$150,000'=150000")

For more information on how to use this function see the help file.

Share:
35,972
orome
Author by

orome

"I mingle the probable with the necessary and draw a plausible conclusion from the mixture."

Updated on July 13, 2022

Comments

  • orome
    orome almost 2 years

    I have factors in R that are salary ranges of the form $100,001 - $150,000, over $150,000, $25,000, etc. and would like to convert these to numeric values (e.g. converting the factor $100,001 - $150,000 to the integer 125000).

    Similarly I have educational categories such as High School Diploma, Current Undergraduate, PhD, etc. that I would like to assign numbers to (e.g., giving PhD a higher value than High School Diploma).

    How do I do this, given the dataframe containing these values?