Issues with tuneGrid parameter in random forest

16,525

It looks like there is a bracket issue with your mtryGrid. Alternatively, you can also use expand.grid to give the different values of mtry you want to try. By default the only parameter you can tune for a random forest is mtry. However you can still pass the others parameters to train. But those will have a fix value an so won't be tuned by train. But you can still ask to use a stratified sample in train. Below is how I would do, assuming that trainY is a boolean variable according which you want to stratify your samples, and that you want samples of size 80 for each category:

mtryGrid <- expand.grid(mtry = 100) # you can put different values for mtry
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               strata = factor(trainY),
               sampsize = c(80, 80), 
               importance = TRUE)
Share:
16,525

Related videos on Youtube

mortonjt
Author by

mortonjt

Updated on June 04, 2022

Comments

  • mortonjt
    mortonjt almost 2 years

    I've been dealing with some extremely imbalanced data and I would like to use stratified sampling to created more balanced random forests

    Right now, I'm using the caret package, mainly to for tuning the random forests. So I try to setup a tuneGrid to pass in the mtry and sampsize parameters into caret train method as follows.

    mtryGrid <- data.frame(.mtry = 100),.sampsize=80)
    rfTune<- train(x = trainX,
                   y = trainY,
                   method = "rf",
                   trControl = ctrl,
                   metric = "Kappa",
                   ntree = 1000,
                   tuneGrid = mtryGrid,
                   importance = TRUE)
    

    When I run this example, I get the following error

    The tuning parameter grid should have columns mtry
    

    I've come across discussions like this suggesting that passing in these parameters in should be possible.

    On the other hand, this page suggests that the only parameter that can be passed in is mtry

    Can I even pass in sampsize into the random forests via caret?

  • mortonjt
    mortonjt over 9 years
    For some reason, I thought sampsize couldn't be passed into train(). Oh well. Thanks!
  • toto_tico
    toto_tico over 7 years
    @Garnieje, what is a good resource to know which parameters can you tune for each method (e.g., mtry for rf)? I was thinking that I could add ntree and run into the same issue...
  • toto_tico
    toto_tico over 7 years
    Nevermind, I found it
  • Seanosapien
    Seanosapien over 6 years
    @toto_tico And if you don't want to read through the documentation: caret::modelLookup(model = "rf")