元数据:
'data.frame': 500 obs. of 8 variables:
$ 载重值 : num 16.2 27.7 26.9 16.5 25.9 ...
$ 基于车轮的车速平均 : num 75.66 16.18 72.61 2.46 78.4 ...
$ 设备号 : num 1e+09 1e+09 1e+09 1e+09 1e+09 ...
$ 发动机扭矩平均 : num 542.4 171.2 600.5 78.6 660.6 ...
$ 变速箱传动比平均 : num 0.901 8.936 1.004 15.044 0.913 ...
$ 发动机燃料消耗率.00.平均: num 13.73 4.05 15.56 2.32 17.7 ...
$ 车辆载重平均 : num 16167 30262 27057 16453 25954 ...
$ 平均燃油消耗量平均 : num 3.87 3.92 5.04 3.87 3.92 ...
wl <- read.csv('C:/Users/Desktop/载重分析数据/data.csv')
wl<- wl[1:500,]
wl_fill <- na.roughfix(wl) #用中位数或众数替代缺失值
set.seed(1)
ind <- sample(2,nrow(wl_fill),replace=T,prob=c(0.7,0.3))#分集
train<-wl_fill[ind==1,]
test<- wl_fill[ind==2,]
library(randomForest)
rf <- randomForest(载重值~.,data=train,mtry=2,ntree=500,prorimity=T,importance=T)
print(rf)
Call:
randomForest(formula = 载重值 ~ ., data = train, mtry = 2, ntree = 500, prorimity = T, importance = T)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 2
Mean of squared residuals: 6.732419
% Var explained: 69.46
#预测
train_pred <- predict(rf,train)
test_pred <- predict(rf,test[,-1])
test_pred
freq <- table(train_pred,train$载重值)
result <- predict(rf,test[,2])
mse <- sum((result- test[,4])^2)/length(result)
mse # 143334.7
感觉后面的预测部分有点乱,并且结果也分析不出什么,谁可以帮忙补充下?