Error in running randomForest : object not found

18,982

Solution 1

So in short, It was a very rookie mistake, i was inputting a matrix rather than a data.frame which was causing this error. Why it was complaining about that particular column (which was not the first) compared to another i still don't understand. Thanks for all the help. Cheers, Anthony

Solution 2

I would suspect this comes from having an illegal variable name in your data frame. Let's consider a data frame that just has a response variable resp and a variable (illegally) named PCNA-AS1:

(dat <- structure(list(`PCNA-AS1` = c(1, 2, 3), resp = structure(c(2L, 2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("PCNA-AS1", "resp"), row.names = c(NA, -3L), class = "data.frame"))
#   PCNA-AS1 resp
# 1        1    1
# 2        2    1
# 3        3    0

Now when we train a random forest we get the indicated error:

library(randomForest)
mod <- randomForest(resp~., data=dat)
# Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found

A natural solution to this problem would be converting your variable names to all be legal:

names(dat) <- make.names(names(dat))
dat
#   PCNA.AS1 resp
# 1        1    1
# 2        2    1
# 3        3    0
mod <- randomForest(resp~., data=dat)

Now the model trains with no error.

Share:
18,982
AHawks
Author by

AHawks

Updated on July 29, 2022

Comments

  • AHawks
    AHawks almost 2 years

    So i am trying to fit a random forest classifier for my dataset. I am very new to R and i imagine this is a simple formatting issue.

    I read in a text file and transform my dataset so it is of this format: (taking out confidential info)

    >head(df.train,2)
    
       GOLGA8A     ITPR3   GPR174  SNORA63    GIMAP8     LEF1    PDE4B LOC100507043    TGFB1I1    SPINT1
    Sample1  3.726046 3.4013711 3.794364 4.265287 -1.514573 7.725775 2.162616    -1.514573 -1.5145732 -1.514573
    Sample2 4.262779 0.9261892 4.744096 7.276971 -1.514573 4.694769 4.707387     2.031476 -0.8325444  2.615991
    ...
    ...
    CD8B     FECH    PYCR1 MGC12916     KCNA3 resp
    Sample1  -1.514573 2.099336 3.427928 1.542951 -1.514573    1
    Sample2 -1.145806 1.204241 2.846832 1.523808  1.616791    1
    

    In essence the columns are my features and the rows my samples, the last column is my response vector which is a column of factors, resp.

    Then i use:

    set.seed(1) #Set the seed in order to gain reproducibility
    
    RF1 = randomForest(resp~., data=df.train,ntree=1000,importance=T,mtry=3)
    

    Simply trying to train the RF on my column resp using the other columns as features.

    But I obtain the error:

    Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found
    

    However, looking into my training set I can clearly find that column, e.g with:

    sort(unique(colnames(df.train))
    

    So I don't really understand the error or where to go from here. My apologies if I haven't posed the question in the correct way, thanks for any and all help!

  • AHawks
    AHawks over 8 years
    Thanks for your comment Josilber, i tried converting to legal names but that wasn't the problem. The error was actually i gave randomForest a matrix (rather than a data frame) which i assumed didn't matter and that randomForest could easily convert between the two. But i was mistaken, so i solved the issue now.
  • josliber
    josliber over 8 years
    @AHawks OK, then all the more reason for you to edit your question to make it reproducible! (aka including the code and data needed to replicate the issue). Try cutting down the columns in your data frame to the smallest number where you can reproduce the issue, and then post that dataset (if you haven't already figured out what's going on first).
  • Soren Havelund Welling
    Soren Havelund Welling over 8 years
    when creating/casting data.frame, check.names=TRUE. So inputting a data.frame could have fixed the problems as illegal col.names would have been edited. In general randomForest gives much fewer problems with data.frame than matrix
  • AHawks
    AHawks over 8 years
    Yes, you are definitely correct, that would have been better and i will do that for future questions, just getting used to presenting problems here on stack overflow so thanks for your advice!
  • Cath
    Cath almost 6 years
    This is not an answer but rather a comment to the real answer, which should be marked as accepted.