- 阅读权限
- 255
- 威望
- 0 级
- 论坛币
- 50288 个
- 通用积分
- 83.6906
- 学术水平
- 253 点
- 热心指数
- 300 点
- 信用等级
- 208 点
- 经验
- 41518 点
- 帖子
- 3256
- 精华
- 14
- 在线时间
- 766 小时
- 注册时间
- 2006-5-4
- 最后登录
- 2022-11-6
|
Classification Trees using R
|
- Building, plotting, and evaluating – classification trees
- How to do it...
- This recipe shows you how you can use the rpart package to build classification trees and
- the rpart.plot package to generate nice-looking tree diagrams:
- 1. Load the rpart, rpart.plot, and caret packages:
- > library(rpart)
- > library(rpart.plot)
- > library(caret)
- 2. Read the data:
- > bn <- read.csv("banknote-authentication.csv")
- 3. Create data partitions. We need two partitions—training and validation. Rather than
- copying the data into the partitions, we will just keep the indices of the cases that
- represent the training cases and subset as and when needed:
- > set.seed(1000)
- > train.idx <- createDataPartition(bn$class, p = 0.7, list =
- FALSE)
- 4. Build the tree:
- > mod <- rpart(class ~ ., data = bn[train.idx, ], method =
- "class", control = rpart.control(minsplit = 20, cp = 0.01))
- 5. View the text output (your result could differ if you did not set the random seed as in
- step 3):
- > mod
- 6. Generate a diagram of the tree (your tree might differ if you did not set the random
- seed as in step 3):
- > prp(mod, type = 2, extra = 104, nn = TRUE, fallen.leaves = TRUE,
- faclen = 4, varlen = 8, shadow.col = "gray")
- 7. Prune the tree:
- > # First see the cptable
- > # !!Note!!: Your table can be different because of the
- > # random aspect in cross-validation
- > mod$cptable
- > # Choose CP value as the highest value whose
- > # xerror is not greater than minimum xerror + xstd
- > # With the above data that happens to be
- > # the fifth one, 0.01182033
- > # Your values could be different because of random
- > # sampling
- > mod.pruned = prune(mod, mod$cptable[5, "CP"])
- 8. View the pruned tree (your tree will look different):
- > prp(mod.pruned, type = 2, extra = 104, nn = TRUE, fallen.leaves
- = TRUE, faclen = 4, varlen = 8, shadow.col = "gray")
- 9. Use the pruned model to predict for the validation partition (note the minus sign
- before train.idx to consider the cases in the validation partition):
- > pred.pruned <- predict(mod, bn[-train.idx,], type = "class")
- 10. Generate the error/classification-confusion matrix:
- > table(bn[-train.idx,]$class, pred.pruned, dnn = c("Actual",
- "Predicted"))
复制代码
|
|