The as.numeric function changes the values in my dataframe

31,762

See FAQ 7.10. Basically when you use as.numeric on a factor then you get the underlying integers. The FAQ has the recipes for turning them into the numbers represented by the strings.

Share:
31,762
Thirst for Knowledge
Author by

Thirst for Knowledge

Updated on April 02, 2020

Comments

  • Thirst for Knowledge
    Thirst for Knowledge about 4 years

    I have a column containing speed measurements which I need to change to numeric so that I can use both the mean and sum functions. However, when I do convert them the values change substantially.

    Why is this?

    This is what my data look like at first:

    enter image description here

    And here is the structure of the data frame:

    'data.frame':   1899571 obs. of  20 variables:
     $ pcd        : Factor w/ 1736958 levels "AB101AA","AB101AB",..: 1 2 3 4 5 6 6 7 7 8 
     $ pcdstatus  : Factor w/ 5 levels "Insufficient Data",..: 4 4 4 4 4 2 3 2 3 3 ...
     $ mbps2      : Factor w/ 3 levels "N","N/A","Y": 2 2 2 2 2 2 2 2 2 2 ...
     $ averagesp  : Factor w/ 301 levels ">=30","0","0.2",..: 301 301 301 301 301 301 301 
     $ mediansp   : Factor w/ 302 levels ">=30","0","0.1",..: 302 302 302 302 302 302 302 
     $ maxsp      : Factor w/ 301 levels ">=30","0","0.2",..: 301 301 301 301 301 301 301 
     $ nga        : Factor w/ 2 levels "N","Y": 1 2 1 1 1 1 1 2 2 2 ...
     $ connections: Factor w/ 119 levels "<3","0","1","10",..: 2 2 2 2 2 1 2 1 2 2 ...
     $ pcd2       : Factor w/ 1736958 levels "AB10 1AA","AB10 1AB",..: 1 2 3 4 5 6 6 7 7 8 
     $ pcds       : Factor w/ 1736958 levels "AB10 1AA","AB10 1AB",..: 1 2 3 4 5 6 6 7 7 8 
     $ oslaua     : Factor w/ 407 levels "","95A","95B",..: 374 374 374 374 374 374 374 
     $ x          : int  394251 394232 394181 394251 394371 394181 394181 394331 394331 
     $ y          : int  806376 806470 806429 806376 806359 806429 806429 806530 806530 
     $ ctry       : Factor w/ 4 levels "E92000001","N92000002",..: 3 3 3 3 3 3 3 3 3 3 ...
     $ hro2       : Factor w/ 13 levels "","E12000001",..: 12 12 12 12 12 12 12 12 12 12 
     $ soa1       : Factor w/ 34381 levels "","E01000001",..: 32485 32485 32485 32485 
     $ dzone1     : Factor w/ 6507 levels "","E99999999",..: 128 128 128 128 112 128 128 
     $ soa2       : Factor w/ 7197 levels "","E02000001",..: 6784 6784 6784 6784 6784 6784 
     $ urindew    : int  9 9 9 9 9 9 9 9 9 9 ...
     $ soa1ni     : Factor w/ 892 levels "","95AA01S1",..: 892 892 892 892 892 892 892 892 
    

    This is the code for converting my variables to numeric variables.

     #convert individual columns to numeric variables  
     total$averagesp <- as.numeric(total$averagesp) 
     total$mediansp <- as.numeric(total$mediansp) 
     total$maxsp <- as.numeric(total$maxsp) 
     total$mbps2 <- as.numeric(total$mbps2)
     total$nga <- as.numeric(total$nga)
     total$connections <- as.numeric(total$connections)
    

    But I have this strange output afterwards where all my data have been inflated:

    enter image description here

    Any help would be much appreciated - thank you!

  • kadrian
    kadrian over 8 years
    Thanks!! as.numeric(as.character(f)) worked for me. I had to add the as.character.