人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › R语言论坛 › randomForest中如何进行k-折交叉验证？

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: 晓茜

21240 8

[问答] randomForest中如何进行k-折交叉验证？ [推广有奖]

0关注
1粉丝

初中生

66%

还不是VIP/贵宾

威望: 0 级
论坛币: 0 个
通用积分: 0
学术水平: 0 点
热心指数: 1 点
信用等级: 0 点
经验: 269 点
帖子: 16
精华: 0
在线时间: 13 小时
注册时间: 2013-6-30
最后登录: 2013-12-31

楼主

晓茜 发表于 2013-11-12 16:53:28 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

如果我用randomForest，语句如下：
library（randomForest）
x=read.table("1.txt")
set.seed(150)
x.rf<-randomForest(V22~.,data=x,importance=TRUE,proximity=TRUE)
print(x.rf)
我想加入5-折交叉验证，那么我需要在哪加些什么语句呢？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：randomForest Forest random 交叉验证 rand library 如何

回帖推荐

jgchen1966 发表于5楼查看完整内容

不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation： Usage rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...) 而在R 的各类综合学习机的PACKAGE 中 cross-valdidation，更是很多，如：caret ,CMA ,rminer ,TDMR, mlr,.......

使用道具举报

沙发

晓茜 发表于 2013-11-13 10:42:11 |只看作者 |坛友微信交流群

有没有会的呢，着急啊

使用道具举报

藤椅

CRouGD 发表于 2013-11-14 16:44:27 |只看作者 |坛友微信交流群

K-折交叉验证是把数据分成K份，然后用K-1份（训练集）去训练模型，剩下的一份（测试集）去测试模型的效果。。因为有K份，所以测试集可以有K份。

建议你去找本书看看，这样会详细点。

使用道具举报

板凳

funpipi

发表于 2013-11-15 12:57:45 |只看作者 |坛友微信交流群

你可以试试这个自编函数rf.cross.validation， x是数据矩阵，y是分类因子，nfolds是交叉检验的fold

# Get balanced folds where each fold has close to overall class ratio
"balanced.folds" <- function(y, nfolds=10){
folds = rep(0, length(y))
classes = levels(y)
# size of each class
Nk = table(y)
# -1 or nfolds = len(y) means leave-one-out
if (nfolds == -1 || nfolds == length(y)){
      invisible(1:length(y))
}
else{
# Can't have more folds than there are items per class
nfolds = min(nfolds, max(Nk))
# Assign folds evenly within each class, then shuffle within each class
      for (k in 1:length(classes)){
         ixs <- which(y==classes[k])
         folds_k <- rep(1:nfolds, ceiling(length(ixs) / nfolds))
         folds_k <- folds_k[1:length(ixs)]
         folds_k <- sample(folds_k)
         folds[ixs] = folds_k
      }
      invisible(folds)
}
}

"rf.cross.validation" <- function(x, y, nfolds=10, verbose=TRUE, ...){
if(nfolds==-1) nfolds <- length(y)
folds <- balanced.folds(y,nfolds=nfolds)
result <- list()
result$y <- as.factor(y)
result$predicted <- result$y
result$probabilities <- matrix(0, nrow=length(result$y), ncol=length(levels(result$y)))
rownames(result$probabilities) <- rownames(x)
colnames(result$probabilities) <- levels(result$y)
result$importances <- matrix(0,nrow=ncol(x),ncol=nfolds)
result$errs <- numeric(length(unique(folds)))

# K-fold cross-validation
for(fold in sort(unique(folds))){
      if(verbose) cat(sprintf('Fold %d...\n',fold))
      foldix <- which(folds==fold)
      model <- randomForest(x[-foldix,], factor(result$y[-foldix]), importance=TRUE, do.trace=verbose, ...)
      newx <- x[foldix,]
      if(length(foldix)==1) newx <- matrix(newx,nrow=1)
      result$predicted[foldix] <- predict(model, newx)
      probs <- predict(model, newx, type='prob')
      result$probabilities[foldix,colnames(probs)] <- probs
      result$errs[fold] <- mean(result$predicted[foldix] != result$y[foldix])
      result$importances[,fold] <- model$importance[,'MeanDecreaseAccuracy']
}

result$nfolds <- nfolds
result$params <- list(...)
result$confusion.matrix <- t(sapply(levels(y), function(level) table(result$predicted[y==level])))
return(result)
}

使用道具举报

报纸

jgchen1966 发表于 2013-11-16 21:25:34 |只看作者 |坛友微信交流群

不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation：
Usage
rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5,
mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)
而在R 的各类综合学习机的PACKAGE 中 cross-valdidation，更是很多，如：caret ,CMA ,rminer ,TDMR, mlr,.......

鹑居鷇食，鸟行无彰

使用道具举报

地板

晓茜 发表于 2013-11-17 17:05:36 |只看作者 |坛友微信交流群

jgchen1966 发表于 2013-11-16 21:25
不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation： ...

我是初学这个的，看了文献上是用R语言的RF包做分类不错，才想试试，好多问题不太理解，请您再指教一下吧~我看了说明书的，有些问题不懂，我最终想要的是在k折交叉验证或jack-knife检验下的Sn（敏感性），Sp（特异性），Acc（预测成功率）和MCC（相关系数）值，或者告诉我预测的分类和正确、错误个数也行啊，可是按照说明书写的程序运行完就是两个图，所以有些不懂了~

使用道具举报

7楼

晓茜 发表于 2013-11-17 17:11:40 |只看作者 |坛友微信交流群

jgchen1966 发表于 2013-11-16 21:25
不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation： ...

按照说明书的例子，输入以下代码：
set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,proximity=TRUE)
print(iris.rf)
出现的结果是：
Call:
randomForest(formula = Species ~ ., data = iris, importance = TRUE,    proximity = TRUE)
            Type of random forest: classification
                  Number of trees: 500
No. of variables tried at each split: 2

      OOB estimate of  error rate: 4%
Confusion matrix:
         setosa versicolor virginica class.error
setosa       50       0       0       0.00
versicolor    0       47       3       0.06
virginica       0          3       47       0.06
>
这样里面包含交叉验证么？我最终就是想要的的这样形式的数据，可是不明白这是在什么交叉验证下得到的数据~

使用道具举报

8楼

jgchen1966 发表于 2013-11-17 17:24:20 |只看作者 |坛友微信交流群

晓茜发表于 2013-11-17 17:11
按照说明书的例子，输入以下代码：
set.seed(71)
iris.rf

这是一个OOB 估计，关于计算机试验是一个较复杂的问题，建议寻找你的老师帮助，本人为私人公司的技术顾问，无法多说，请谅！！

鹑居鷇食，鸟行无彰

使用道具举报

9楼

麻烦and纠结 发表于 2013-12-16 11:44:30 |只看作者 |坛友微信交流群

晓茜发表于 2013-11-17 17:11
按照说明书的例子，输入以下代码：
set.seed(71)
iris.rf

没有进行交叉验证啊这里有一个函数rrfcv {RRF} R Documentation

Random Forest Cross-Valdidation for feature selection
Description
This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.

使用道具举报