Understanding of minbucket function in CART model using R
From the documentation for the rpart
package:
minbucket
the minimum number of observations in any terminal node. If onlyone of minbucket or minsplit is specified, the code either sets minsplit tominbucket*3 or minbucket to minsplit/3, as appropriate.
Setting minbucket
to 1 is meaningless, since each leaf node will (by definition) have at least one observation on it. If you set it to a higher value, say 3, then it would mean that every leaf node would have at least 3 observations in that bucket.
The smaller the value of minbucket
, the more precise your CART model will be. By setting minbucket
to too small a value, such as 1, you may run the risk of overfitting your model.
GBOT
Updated on June 07, 2022Comments
-
GBOT about 2 years
Assume the training data is "fruit", which I am going to use it for predict using CART model in R
> fruit=data.frame( color=c("red", "red", "red", "yellow", "red","yellow", "orange","green","pink", "red", "red"), isApple=c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE,FALSE,FALSE,FALSE,TRUE)) > mod = rpart(isApple ~ color, data=fruit, method="class", minbucket=1) > prp(mod)
Could anyone explain what is exactly the role of
minbucket
in plotting CART tree for this example if we are going to useminbucket
= 2, 3, 4, 5?See i have 2 variables color & isApple. Color variable has green, yellow, pink, orange and Red. is Apple variable has value TRUE or FALSE. In the last example, RED has three TRUE and 2 FALSE mapped with it. Red value appear five times. if i give minbucket = 1,2,3 then it is splitting. If I give minbucket = 4 or 5 then no split occurs though red appears five times.
-
GBOT about 9 yearsfruit=data.frame( color=c("red","red","red","yellow","red","yellow","orange","green","pink","red","red"), isApple=c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE)) say this is my data frame, we r finding whether the outcome is apple or not?????? we hav 5 red apples, 1 tomato here, so what eva is red need not be an apple. but if i give minbucket=5 or 4 here there is no split at all. for min bucket 1 to 3 there is a split beyond 3 there is no split. But i have more than 3 observation in my leaf node. Please upvote my question thanks.... @tim-biegeleisen
-
GBOT about 9 yearsstackoverflow.com/users/3710546/pascal. I have edited the original question. is it understandable now????
-
Tim Biegeleisen about 9 yearsCould you ask a new question?
-
GBOT about 9 yearsstackoverflow.com/users/1863229/tim-biegeleisen. See i have 2 variables color & isApple. Color variable has green, yellow, pink, orange and Red. is Apple variable has value TRUE or FALSE. In the last example, RED has three TRUE and 2 FALSE mapped with it. Red value appear five times. if i give minbucket = 1,2,3 then it is splitting. If i give minbucket =4 or 5 ther is no split occurs though red appears five times. Sorry i could not attach screenshot, i need 10 reputation to attach. :( :(