Error in running randomForest : object not found
Solution 1
So in short, It was a very rookie mistake, i was inputting a matrix rather than a data.frame which was causing this error. Why it was complaining about that particular column (which was not the first) compared to another i still don't understand. Thanks for all the help. Cheers, Anthony
Solution 2
I would suspect this comes from having an illegal variable name in your data frame. Let's consider a data frame that just has a response variable resp
and a variable (illegally) named PCNA-AS1
:
(dat <- structure(list(`PCNA-AS1` = c(1, 2, 3), resp = structure(c(2L, 2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("PCNA-AS1", "resp"), row.names = c(NA, -3L), class = "data.frame"))
# PCNA-AS1 resp
# 1 1 1
# 2 2 1
# 3 3 0
Now when we train a random forest we get the indicated error:
library(randomForest)
mod <- randomForest(resp~., data=dat)
# Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found
A natural solution to this problem would be converting your variable names to all be legal:
names(dat) <- make.names(names(dat))
dat
# PCNA.AS1 resp
# 1 1 1
# 2 2 1
# 3 3 0
mod <- randomForest(resp~., data=dat)
Now the model trains with no error.
AHawks
Updated on July 29, 2022Comments
-
AHawks almost 2 years
So i am trying to fit a random forest classifier for my dataset. I am very new to R and i imagine this is a simple formatting issue.
I read in a text file and transform my dataset so it is of this format: (taking out confidential info)
>head(df.train,2) GOLGA8A ITPR3 GPR174 SNORA63 GIMAP8 LEF1 PDE4B LOC100507043 TGFB1I1 SPINT1 Sample1 3.726046 3.4013711 3.794364 4.265287 -1.514573 7.725775 2.162616 -1.514573 -1.5145732 -1.514573 Sample2 4.262779 0.9261892 4.744096 7.276971 -1.514573 4.694769 4.707387 2.031476 -0.8325444 2.615991 ... ... CD8B FECH PYCR1 MGC12916 KCNA3 resp Sample1 -1.514573 2.099336 3.427928 1.542951 -1.514573 1 Sample2 -1.145806 1.204241 2.846832 1.523808 1.616791 1
In essence the columns are my features and the rows my samples, the last column is my response vector which is a column of factors, resp.
Then i use:
set.seed(1) #Set the seed in order to gain reproducibility RF1 = randomForest(resp~., data=df.train,ntree=1000,importance=T,mtry=3)
Simply trying to train the RF on my column
resp
using the other columns as features.But I obtain the error:
Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found
However, looking into my training set I can clearly find that column, e.g with:
sort(unique(colnames(df.train))
So I don't really understand the error or where to go from here. My apologies if I haven't posed the question in the correct way, thanks for any and all help!
-
AHawks over 8 yearsThanks for your comment Josilber, i tried converting to legal names but that wasn't the problem. The error was actually i gave randomForest a matrix (rather than a data frame) which i assumed didn't matter and that randomForest could easily convert between the two. But i was mistaken, so i solved the issue now.
-
josliber over 8 years@AHawks OK, then all the more reason for you to edit your question to make it reproducible! (aka including the code and data needed to replicate the issue). Try cutting down the columns in your data frame to the smallest number where you can reproduce the issue, and then post that dataset (if you haven't already figured out what's going on first).
-
Soren Havelund Welling over 8 yearswhen creating/casting data.frame, check.names=TRUE. So inputting a data.frame could have fixed the problems as illegal col.names would have been edited. In general randomForest gives much fewer problems with data.frame than matrix
-
AHawks over 8 yearsYes, you are definitely correct, that would have been better and i will do that for future questions, just getting used to presenting problems here on stack overflow so thanks for your advice!
-
Cath almost 6 yearsThis is not an answer but rather a comment to the real answer, which should be marked as accepted.