楼主: Lisrelchen
2667 24

Data Science with R Decision Trees [推广有奖]

  • 0关注
  • 62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
49957 个
通用积分
79.5487
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Data Science with R Decision Trees

Decision trees are widely used in data mining and well supported in R (R Core Team, 2014). Decision tree learning deploys a divide and conquer approach, known as recursive partitioning. It is usually implemented as a greedy search using information gain or the Gini index to select the best input variable on which to partition our dataset at each step.
This Module introduces rattle (Williams, 2014) and rpart (Therneau and Atkinson, 2014) for building decision trees. We begin with a step-by-step example of building a decision tree using Rattle, and then illustrate the process using R begining with Section 14. We cover both classification trees and regression trees.
The required packages for this module include:
  • library(rattle) # GUI for building trees and fancy tree plot
  • library(rpart) # Popular decision tree algorithm
  • library(rpart.plot) # Enhanced tree plots
  • library(party) # Alternative decision tree algorithm
  • library(partykit) # Convert rpart object to BinaryTree
  • library(RWeka) # Weka decision tree J48.
  • library(C50) # Original C5.0 implementation.

As we work through this chapter, new R commands will be introduced. Be sure to review the command’s documentation and understand what the command does. You can ask for help using the ? command as in:
?read.csv
We can obtain documentation on a particular package using the help= option of library():
library(help=rattle)
This chapter is intended to be hands on. To learn effectively, you are encouraged to have R running (e.g., RStudio) and to run all the commands as they appear here. Check that you get the same output, and you understand the output. Try some variations. Explore.

本帖隐藏的内容

Data Science with R Decision Trees.rar (1.14 MB) 本附件包括:
  • Data Science with R Decision Trees.pdf








二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Decision Science Trees rees learning usually search 2014

已有 2 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
crystal8832 + 50 + 3 + 3 + 3 奖励积极上传好的资料
oliyiyi + 100 精彩帖子

总评分: 经验 + 100  论坛币 + 50  学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2015-6-13 08:08:27 |只看作者 |坛友微信交流群
  1. ## ----module, echo=FALSE, results="asis"----------------------------------
  2. Module <- "DTreesO"
  3. cat(paste0("\\newcommand{\\Module}{", Module, "}"))


  4. ## ----setup, child="mycourse.Rnw"-----------------------------------------

  5. ## ----setup_options, include=FALSE----------------------------------------
  6. library(knitr)
  7. library(xtable)

  8. opts_chunk$set(cache=FALSE)

  9. opts_chunk$set(out.width='0.8\\textwidth')
  10. opts_chunk$set(fig.align='center')

  11. opts_chunk$set(src.top=NULL)
  12. opts_chunk$set(src.bot=NULL)
  13. opts_chunk$set(out.lines=4)
  14. opts_chunk$set(out.truncate=80)

  15. opts_chunk$set(fig.path=sprintf("figures/%s/", Module))
  16. opts_chunk$set(cache.path=sprintf("cache/%s/", Module))
  17. opts_chunk$set(bib.file=paste0(Module, ".bib"))

  18. # Leave code as I have formatted it.

  19. opts_chunk$set(tidy=FALSE)

  20. # Hooks

  21. # Allow auto crop of base graphics plots when crop=TRUE.

  22. knit_hooks$set(crop=hook_pdfcrop)

  23. # Truncate long lines and long output

  24. hook_output <- knit_hooks$get("output")
  25. hook_source <- knit_hooks$get("source")
  26. knit_hooks$set(output=function(x, options)
  27. {
  28.   if (options$results != "asis")
  29.   {
  30.     # Split string into separate lines.
  31.     x <- unlist(stringr::str_split(x, "\n"))
  32.     # Trim to the number of lines specified.
  33.     if (!is.null(n <- options$out.lines))
  34.     {
  35.       if (length(x) > n)
  36.       {
  37.         # Truncate the output.
  38.         x <- c(head(x, n), "....\n")
  39.       }
  40.     }
  41.     # Truncate each line to length specified.
  42.     if (!is.null(m <- options$out.truncate))
  43.     {
  44.       len <- nchar(x)
  45.       x[len>m] <- paste0(substr(x[len>m], 0, m-3), "...")
  46.     }
  47.     # Paste lines back together.
  48.     x <- paste(x, collapse="\n")
  49.     # Replace ' = ' with '=' - my preference. Hopefully won't
  50.     # affect things inappropriately.
  51.     x <- gsub(" = ", "=", x)
  52.   }
  53.   hook_output(x, options)
  54. },
  55. source=function(x, options)
  56. {
  57.   # Split string into separate lines.
  58.   x <- unlist(stringr::str_split(x, "\n"))
  59.   # Trim to the number of lines specified.
  60.   if (!is.null(n <- options$src.top))
  61.   {
  62.     if (length(x) > n)
  63.     {
  64.       # Truncate the output.
  65.       if (is.null(m <-options$src.bot)) m <- 0
  66.       x <- c(head(x, n+1), "\n....\n", tail(x, m+2))
  67.    }
  68.   }
  69.   # Paste lines back together.
  70.   x <- paste(x, collapse="\n")
  71.   hook_source(x, options)
  72. })

  73. # Optionally allow R Code chunks to be environments so we can refer to them.

  74. knit_hooks$set(rcode=function(before, options, envir)
  75. {
  76.   if (before)
  77.     sprintf('\\begin{rcode}\\label{%s}\\hfill{}', options$label)
  78.   else
  79.     '\\end{rcode}'
  80. })



  81. ## ----load_packages, message=FALSE----------------------------------------
  82. library(rattle)         # GUI for building trees and fancy tree plot
  83. library(rpart)          # Popular decision tree algorithm
  84. library(rpart.plot)     # Enhanced tree plots
  85. library(party)          # Alternative decision tree algorithm
  86. library(partykit)       # Convert rpart object to BinaryTree
  87. library(RWeka)          # Weka decision tree J48.
  88. library(C50)            # Original C5.0 implementation.


  89. ## ----echo=FALSE----------------------------------------------------------
  90. library(RColorBrewer)


  91. ## ----documentation, child="documentation.Rnw", eval=TRUE-----------------


  92. ## ----help_library, eval=FALSE, tidy=FALSE--------------------------------
  93. ## ?read.csv


  94. ## ----help_package, eval=FALSE--------------------------------------------
  95. ## library(help=rattle)


  96. ## ----record_start_time, echo=FALSE---------------------------------------
  97. start.time <- proc.time()


  98. ## ----generate_bib, echo=FALSE, message=FALSE, warning=FALSE--------------
  99. # Write all packages in the current session to a bib file
  100. if (is.null(opts_chunk$get("bib.file"))) opts_chunk$set(bib.file="Course.bib")
  101. write_bib(sub("^.*/", "", grep("^/", searchpaths(), value=TRUE)),
  102.           file=opts_chunk$get("bib.file"))
  103. system(paste("cat extra.bib >>", opts_chunk$get("bib.file")))
  104. # Fix up specific issues.
  105. # R-earth
  106. system(paste("perl -pi -e 's|. Derived from .*$|},|'",
  107.              opts_chunk$get("bib.file")))
  108. # R-randomForest
  109. system(paste("perl -pi -e 's|Fortran original by Leo Breiman",
  110.              "and Adele Cutler and R port by|Leo Breiman and",
  111.              "Adele Cutler and|'", opts_chunk$get("bib.file")))
  112. # R-C50
  113. system(paste("perl -pi -e 's|. C code for C5.0 by R. Quinlan|",
  114.              " and J. Ross Quinlan|'", opts_chunk$get("bib.file")))
  115. # R-caret
  116. system(paste("perl -pi -e 's|. Contributions from|",
  117.              " and|'", opts_chunk$get("bib.file")))
  118. # Me
  119. system(paste("perl -pi -e 's|Graham Williams|",
  120.              "Graham J Williams|'", opts_chunk$get("bib.file")))




  121. ## ----start_rattle, eval=FALSE--------------------------------------------
  122. ## library(rattle)
  123. ## rattle()


  124. ## ----rattle_commands, eval=FALSE-----------------------------------------
  125. ## set.seed(42)
  126. ## library(rattle)
  127. ## library(rpart)
  128. ## ds     <- weather
  129. ## target <- "RainTomorrow"
  130. ## nobs   <- nrow(ds)
  131. ## form   <- formula(paste(target, "~ ."))
  132. ## train  <- sample(nobs, 0.70 * nobs)
  133. ## vars   <- -c(1,2,23)
  134. ## model  <- rpart(form, ds[train, vars], parms=list(split="information"))


  135. ## ----prepare_data_weather------------------------------------------------
  136. set.seed(1426)
  137. library(rattle)
  138. data(weather)
  139. dsname       <- "weather"
  140. ds           <- get(dsname)
  141. id           <- c("Date", "Location")
  142. target       <- "RainTomorrow"
  143. risk         <- "RISK_MM"
  144. ignore       <- c(id, if (exists("risk")) risk)
  145. (vars        <- setdiff(names(ds), ignore))
  146. inputs       <- setdiff(vars, target)
  147. (nobs        <- nrow(ds))
  148. (numerics    <- intersect(inputs, names(ds)[which(sapply(ds[vars], is.numeric))]))
  149. (categorics  <- intersect(inputs, names(ds)[which(sapply(ds[vars], is.factor))]))
  150. (form        <- formula(paste(target, "~ .")))
  151. length(train <- sample(nobs, 0.7*nobs))
  152. length(test  <- setdiff(seq_len(nobs), train))
  153. actual       <- ds[test, target]
  154. risks        <- ds[test, risk]


  155. ## ----check_dataset, out.lines=5------------------------------------------
  156. dim(ds)
  157. names(ds)
  158. head(ds)
  159. tail(ds)
  160. str(ds)
  161. summary(ds)


  162. ## ----dt_model------------------------------------------------------------
  163. model <- rpart(formula=form, data=ds[train, vars])


  164. ## ----dt_model_no_argument_names------------------------------------------
  165. str(rpart)
  166. model <- rpart(form, ds[train, vars])


  167. ## ----display_model, out.lines=NULL---------------------------------------
  168. model


  169. ## ----dt_model_summary, out.lines=40--------------------------------------
  170. summary(model)


  171. ## ----dt_model_printcp, out.lines=NULL------------------------------------
  172. printcp(model)


  173. ## ----dt_model_plotcp,----------------------------------------------------
  174. plotcp(model)


  175. ## ----tmodel_weatherAUS_for_cp--------------------------------------------
  176. tmodel <- rpart(form, weatherAUS[vars])
  177. plotcp(tmodel)


  178. ## ----tmodel_weatherAUS_cp_0----------------------------------------------
  179. tmodel <- rpart(form, weatherAUS[vars], control=rpart.control(cp=0))
  180. plotcp(tmodel)


  181. ## ----out.lines=NULL------------------------------------------------------
  182. tmodel$cptable[c(1:5,22:29, 80:83),]


  183. ## ------------------------------------------------------------------------
  184. model$variable.importance


  185. ## ----message=FALSE-------------------------------------------------------
  186. predicted <- predict(model, ds[test, vars], type="prob")[,2]
  187. riskchart(predicted, actual, risks)


  188. ## ------------------------------------------------------------------------
  189. predicted <- predict(model, ds[test, vars], type="class")
  190. sum(actual != predicted)/length(predicted) # Overall error rate
  191. round(100*table(actual, predicted, dnn=c("Actual", "Predicted"))/length(predicted))


  192. ## ------------------------------------------------------------------------
  193. asRules.rpart <- function(model)
  194. {
  195.   if (!inherits(model, "rpart")) stop("Not a legitimate rpart tree")
  196.   #
  197.   # Get some information.
  198.   #
  199.   frm     <- model$frame
  200.   names   <- row.names(frm)
  201.   ylevels <- attr(model, "ylevels")
  202.   ds.size <- model$frame[1,]$n
  203.   #
  204.   # Print each leaf node as a rule.
  205.   #
  206.   for (i in 1:nrow(frm))
  207.   {
  208.     if (frm[i,1] == "<leaf>")
  209.     {
  210.       # The following [,5] is hardwired - needs work!
  211.       cat("\n")
  212.       cat(sprintf(" Rule number: %s ", names[i]))
  213.       cat(sprintf("[yval=%s cover=%d (%.0f%%) prob=%0.2f]\n",
  214.                   ylevels[frm[i,]$yval], frm[i,]$n,
  215.                   round(100*frm[i,]$n/ds.size), frm[i,]$yval2[,5]))
  216.       pth <- path.rpart(model, nodes=as.numeric(names[i]), print.it=FALSE)
  217.       cat(sprintf("   %s\n", unlist(pth)[-1]), sep="")
  218.     }
  219.   }
  220. }
复制代码

使用道具

藤椅
Lisrelchen 发表于 2015-6-13 08:09:29 |只看作者 |坛友微信交流群
  1. ## ----out.lines=14--------------------------------------------------------
  2. asRules(model)


  3. ## ----basic_plot, echo=FALSE----------------------------------------------
  4. plot(model)
  5. text(model)


  6. ## ----basic_plot, eval=FALSE----------------------------------------------
  7. ## plot(model)
  8. ## text(model)


  9. ## ----basic_plot_uniform, echo=FALSE--------------------------------------
  10. plot(model, uniform=TRUE)
  11. text(model)


  12. ## ----basic_plot_uniform, eval=FALSE--------------------------------------
  13. ## plot(model, uniform=TRUE)
  14. ## text(model)


  15. ## ----basic_plot_extra, echo=FALSE----------------------------------------
  16. plot(model, uniform=TRUE)
  17. text(model, use.n=TRUE, all=TRUE, cex=.8)


  18. ## ----basic_plot_extra, eval=FALSE----------------------------------------
  19. ## plot(model, uniform=TRUE)
  20. ## text(model, use.n=TRUE, all=TRUE, cex=.8)


  21. ## ----fancy_plot, message=FALSE, echo=FALSE-------------------------------
  22. fancyRpartPlot(model)


  23. ## ----fancy_plot, message=FALSE, eval=FALSE-------------------------------
  24. ## fancyRpartPlot(model)


  25. ## ----prp_default, echo=FALSE---------------------------------------------
  26. prp(model)


  27. ## ----prp_default, eval=FALSE---------------------------------------------
  28. ## prp(model)


  29. ## ----prp_fav, echo=FALSE-------------------------------------------------
  30. prp(model, type=2, extra=104, nn=TRUE, fallen.leaves=TRUE,
  31.     faclen=0, varlen=0, shadow.col="grey", branch.lty=3)


  32. ## ----prp_fav, eval=FALSE-------------------------------------------------
  33. ## prp(model, type=2, extra=104, nn=TRUE, fallen.leaves=TRUE,
  34. ##     faclen=0, varlen=0, shadow.col="grey", branch.lty=3)


  35. ## ----prp_colour, echo=FALSE----------------------------------------------
  36. col <- c("#FD8D3C", "#FD8D3C", "#FD8D3C", "#BCBDDC",
  37.          "#FDD0A2", "#FD8D3C", "#BCBDDC")
  38. prp(model, type=2, extra=104, nn=TRUE, fallen.leaves=TRUE,
  39.     faclen=0, varlen=0, shadow.col="grey", branch.lty=3, box.col=col)


  40. ## ----prp_colour, eval=FALSE----------------------------------------------
  41. ## col <- c("#FD8D3C", "#FD8D3C", "#FD8D3C", "#BCBDDC",
  42. ##          "#FDD0A2", "#FD8D3C", "#BCBDDC")
  43. ## prp(model, type=2, extra=104, nn=TRUE, fallen.leaves=TRUE,
  44. ##     faclen=0, varlen=0, shadow.col="grey", branch.lty=3, box.col=col)


  45. ## ----prp_label_nodes, echo=FALSE-----------------------------------------
  46. prp(model, type=1)


  47. ## ----prp_label_nodes, eval=FALSE-----------------------------------------
  48. ## prp(model, type=1)


  49. ## ----prp_label_below, echo=FALSE-----------------------------------------
  50. prp(model, type=2)


  51. ## ----prp_label_below, eval=FALSE-----------------------------------------
  52. ## prp(model, type=2)


  53. ## ----prp_split_labels, echo=FALSE----------------------------------------
  54. prp(model, type=3)


  55. ## ----prp_split_labels, eval=FALSE----------------------------------------
  56. ## prp(model, type=3)


  57. ## ----prp_interior_lables, echo=FALSE-------------------------------------
  58. prp(model, type=4)


  59. ## ----prp_interior_lables, eval=FALSE-------------------------------------
  60. ## prp(model, type=4)


  61. ## ----prp_num_obs, echo=FALSE---------------------------------------------
  62. prp(model, type=2, extra=1)


  63. ## ----prp_num_obs, eval=FALSE---------------------------------------------
  64. ## prp(model, type=2, extra=1)


  65. ## ----prp_per_obs, echo=FALSE---------------------------------------------
  66. prp(model, type=2, extra=101)


  67. ## ----prp_per_obs, eval=FALSE---------------------------------------------
  68. ## prp(model, type=2, extra=101)


  69. ## ----prp_class_rate, echo=FALSE------------------------------------------
  70. prp(model, type=2, extra=2)


  71. ## ----prp_class_rate, eval=FALSE------------------------------------------
  72. ## prp(model, type=2, extra=2)


  73. ## ----prp_add_per_obs, echo=FALSE-----------------------------------------
  74. prp(model, type=2, extra=102)


  75. ## ----prp_add_per_obs, eval=FALSE-----------------------------------------
  76. ## prp(model, type=2, extra=102)


  77. ## ----prp_miss_rate, echo=FALSE-------------------------------------------
  78. prp(model, type=2, extra=3)


  79. ## ----prp_miss_rate, eval=FALSE-------------------------------------------
  80. ## prp(model, type=2, extra=3)


  81. ## ----prp_prob_class, echo=FALSE------------------------------------------
  82. prp(model, type=2, extra=4)


  83. ## ----prp_prob_class, eval=FALSE------------------------------------------
  84. ## prp(model, type=2, extra=4)


  85. ## ----prp_prob_class_per_obs, echo=FALSE----------------------------------
  86. prp(model, type=2, extra=104)


  87. ## ----prp_prob_class_per_obs, eval=FALSE----------------------------------
  88. ## prp(model, type=2, extra=104)


  89. ## ----prp_only_prob, echo=FALSE-------------------------------------------
  90. prp(model, type=2, extra=5)


  91. ## ----prp_only_prob, eval=FALSE-------------------------------------------
  92. ## prp(model, type=2, extra=5)


  93. ## ----prp_second_class, echo=FALSE----------------------------------------
  94. prp(model, type=2, extra=6)


  95. ## ----prp_second_class, eval=FALSE----------------------------------------
  96. ## prp(model, type=2, extra=6)


  97. ## ----prp_second_class_per_obs, echo=FALSE--------------------------------
  98. prp(model, type=2, extra=106)


  99. ## ----prp_second_class_per_obs, eval=FALSE--------------------------------
  100. ## prp(model, type=2, extra=106)


  101. ## ----prp_second_class_only_prob, echo=FALSE------------------------------
  102. prp(model, type=2, extra=7)


  103. ## ----prp_second_class_only_prob, eval=FALSE------------------------------
  104. ## prp(model, type=2, extra=7)


  105. ## ----prp_extra_8, echo=FALSE---------------------------------------------
  106. prp(model, type=2, extra=8)


  107. ## ----prp_extra_8, eval=FALSE---------------------------------------------
  108. ## prp(model, type=2, extra=8)


  109. ## ----prp_extra_9, echo=FALSE---------------------------------------------
  110. prp(model, type=2, extra=9)


  111. ## ----prp_extra_9, eval=FALSE---------------------------------------------
  112. ## prp(model, type=2, extra=9)


  113. ## ----prp_extra_100, echo=FALSE-------------------------------------------
  114. prp(model, type=2, extra=100)


  115. ## ----prp_extra_100, eval=FALSE-------------------------------------------
  116. ## prp(model, type=2, extra=100)


  117. ## ----prp_extra_106, echo=FALSE-------------------------------------------
  118. prp(model, type=2, extra=106, nn=TRUE)


  119. ## ----prp_extra_106, eval=FALSE-------------------------------------------
  120. ## prp(model, type=2, extra=106, nn=TRUE)


  121. ## ----prp_extra_106_ni, echo=FALSE----------------------------------------
  122. prp(model, type=2, extra=106, nn=TRUE, ni=TRUE)


  123. ## ----prp_extra_106_ni, eval=FALSE----------------------------------------
  124. ## prp(model, type=2, extra=106, nn=TRUE, ni=TRUE)


  125. ## ----prp_extra_106_fallen, echo=FALSE------------------------------------
  126. prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE)


  127. ## ----prp_extra_106_fallen, eval=FALSE------------------------------------
  128. ## prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE)


  129. ## ----prp_extra_106_fallen_branch, echo=FALSE-----------------------------
  130. prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  131.     branch=0.5)


  132. ## ----prp_extra_106_fallen_branch, eval=FALSE-----------------------------
  133. ## prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  134. ##     branch=0.5)


  135. ## ----prp_extra_106_faclen, echo=FALSE------------------------------------
  136. prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  137.     faclen=0)


  138. ## ----prp_extra_106_faclen, eval=FALSE------------------------------------
  139. ## prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  140. ##     faclen=0)


  141. ## ----prp_axtra_106_shadow, echo=FALSE------------------------------------
  142. prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  143.     shadow.col="grey")


  144. ## ----prp_axtra_106_shadow, eval=FALSE------------------------------------
  145. ## prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  146. ##     shadow.col="grey")


  147. ## ----prp_extra_106_branch, echo=FALSE------------------------------------
  148. prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  149.     branch.lty=3)


  150. ## ----prp_extra_106_branch, eval=FALSE------------------------------------
  151. ## prp(model, type=2, extra=106, nn=TRUE, fallen.leaves=TRUE,
  152. ##     branch.lty=3)


  153. ## ----eval=FALSE----------------------------------------------------------
  154. ## plot(c(0,1), c(0,0), type="l", axes=FALSE, xlab=NA, ylab=NA, lty=2)
  155. ## plot(c(0,1), c(0,0), type="l", axes=FALSE, xlab=NA, ylab=NA, lty="dashed")
  156. ## plot(c(0,1), c(0,0), type="l", axes=FALSE, xlab=NA, ylab=NA, lty="44")


  157. ## ----eval=FALSE----------------------------------------------------------
  158. ## install.packages("partykit", repos="http://R-Forge.R-project.org")
  159. ## library(partykit)


  160. ## ----fig.width=14, out.width="\\textwidth"-------------------------------
  161. class(model)
  162. plot(as.party(model))


  163. ## ----out.lines=15--------------------------------------------------------
  164. print(as.party(model))


  165. ## ----message=FALSE-------------------------------------------------------
  166. library(partykit)
  167. model <- ctree(formula=form, data=ds[train, vars])


  168. ## ----out.lines=NULL------------------------------------------------------
  169. model


  170. ## ----message=FALSE-------------------------------------------------------
  171. predicted <- predict(model, ds[test, vars], type="prob")[,2]
  172. riskchart(predicted, actual, risks)


  173. ## ------------------------------------------------------------------------
  174. predicted <- predict(model, ds[test, vars], type="response")
  175. sum(actual != predicted)/length(predicted) # Overall error rate
  176. round(100*table(actual, predicted, dnn=c("Actual", "Predicted"))/length(predicted))


  177. ## ------------------------------------------------------------------------
  178. plot(model)


  179. ## ------------------------------------------------------------------------
  180. library(RWeka)
  181. model <- J48(formula=form, data=ds[train, vars])


  182. ## ----out.lines=NULL------------------------------------------------------
  183. model


  184. ## ----message=FALSE-------------------------------------------------------
  185. predicted <- predict(model, ds[test, vars], type="prob")[,2]
  186. riskchart(predicted, actual, risks)


  187. ## ------------------------------------------------------------------------
  188. predicted <- predict(model, ds[test, vars], type="class")
  189. sum(actual != predicted)/length(predicted) # Overall error rate
  190. round(100*table(actual, predicted, dnn=c("Actual", "Predicted"))/length(predicted))


  191. ## ------------------------------------------------------------------------
  192. plot(as.party(model))


  193. ## ----out.lines=12--------------------------------------------------------
  194. print(as.party(model))


  195. ## ----c50-----------------------------------------------------------------
  196. library(C50)
  197. model <- C5.0(form, ds[train, vars])


  198. ## ----c50_print, out.lines=NULL-------------------------------------------
  199. model


  200. ## ----out.lines=NULL------------------------------------------------------
  201. C5imp(model)


  202. ## ----c50_summary, out.lines=40-------------------------------------------
  203. summary(model)


  204. ## ----message=FALSE-------------------------------------------------------
  205. predicted <- predict(model, ds[test, vars], type="prob")[,2]
  206. riskchart(predicted, actual, risks)


  207. ## ------------------------------------------------------------------------
  208. predicted <- predict(model, ds[test, vars], type="class")
  209. sum(actual != predicted)/length(predicted) # Overall error rate
  210. round(100*table(actual, predicted, dnn=c("Actual", "Predicted"))/length(predicted))


  211. ## ----c50_rules-----------------------------------------------------------
  212. library(C50)
  213. model <- C5.0(form, ds[train, vars], rules=TRUE)


  214. ## ----c50_rules_print, out.lines=NULL-------------------------------------
  215. model


  216. ## ----out.lines=NULL------------------------------------------------------
  217. C5imp(model)


  218. ## ----c50_rules_summary, out.lines=40-------------------------------------
  219. summary(model)


  220. ## ----message=FALSE-------------------------------------------------------
  221. predicted <- predict(model, ds[test, vars], type="prob")[,2]
  222. riskchart(predicted, actual, risks)
复制代码

使用道具

板凳
bingyang1008 发表于 2015-6-13 08:09:39 |只看作者 |坛友微信交流群
感谢分享

使用道具

报纸
Lisrelchen 发表于 2015-6-13 08:10:19 |只看作者 |坛友微信交流群
  1. ## ------------------------------------------------------------------------
  2. predicted <- predict(model, ds[test, vars], type="class")
  3. sum(ds[test, target] != predicted)/length(predicted) # Overall error rate
  4. round(100*table(ds[test, target], predicted, dnn=c("Actual", "Predicted"))/length(predicted))


  5. ## ----build_regression_model, out.lines=NULL------------------------------
  6. target <- "RISK_MM"
  7. vars <- c(inputs, target)
  8. form <- formula(paste(target, "~ ."))
  9. (model <- rpart(formula=form, data=ds[train, vars]))


  10. ## ----basic_plot_regression-----------------------------------------------
  11. plot(model)
  12. text(model)


  13. ## ----basic_plot_uniform_regression---------------------------------------
  14. plot(model, uniform=TRUE)
  15. text(model)


  16. ## ----basic_plot_extra_regressoin-----------------------------------------
  17. plot(model, uniform=TRUE)
  18. text(model, use.n=TRUE, all=TRUE, cex=.8)


  19. ## ----fancy_plot_regression-----------------------------------------------
  20. fancyRpartPlot(model)


  21. ## ------------------------------------------------------------------------
  22. prp(model)


  23. ## ------------------------------------------------------------------------
  24. prp(model, type=2, extra=101, nn=TRUE, fallen.leaves=TRUE,
  25.     faclen=0, varlen=0, shadow.col="grey", branch.lty=3)


  26. ## ----fig.width=14, out.width="\\textwidth"-------------------------------
  27. class(model)
  28. plot(as.party(model))


  29. ## ------------------------------------------------------------------------
  30. model <- ctree(formula=form, data=ds[train, vars])


  31. ## ----out.lines=45--------------------------------------------------------
  32. model


  33. ## ------------------------------------------------------------------------
  34. plot(model)


  35. ## ----common_outtro, child="finale.Rnw", eval=TRUE------------------------


  36. ## ----syinfo, child="sysinfo.Rnw", eval=TRUE------------------------------

  37. ## ----echo=FALSE, message=FALSE-------------------------------------------
  38. require(Hmisc)
  39. pkg <- "knitr"
  40. pkg.version <- installed.packages()[pkg, 'Version']
  41. pkg.date <- installed.packages(fields="Date")[pkg, 'Date']
  42. pkg.info <- paste(pkg, pkg.version, pkg.date)

  43. rev <- system("bzr revno", intern=TRUE)
  44. cpu <- system(paste("cat /proc/cpuinfo | grep 'model name' |",
  45.                     "head -n 1 | cut -d':' -f2"), intern=TRUE)
  46. ram <- system("cat /proc/meminfo | grep MemTotal: | awk '{print $2}'",
  47.               intern=TRUE)
  48. ram <- paste0(round(as.integer(ram)/1e6, 1), "GB")
  49. user <- Sys.getenv("LOGNAME")
  50. node <- Sys.info()[["nodename"]]
  51. user.node <- paste0(user, "@", node)
  52. gcc.version <- system("g++ -v 2>&1 | grep 'gcc version' | cut -d' ' -f1-3",
  53.                       intern=TRUE)
  54. os <- system("lsb_release -d | cut -d: -f2 | sed 's/^[ \t]*//'", intern=TRUE)
复制代码

使用道具

地板
duoduoduo 在职认证  发表于 2015-6-13 08:23:00 |只看作者 |坛友微信交流群
什么东西那
书吗

使用道具

7
fengyg 企业认证  发表于 2015-6-13 08:30:23 |只看作者 |坛友微信交流群
kankan

使用道具

8
lhf8059 发表于 2015-6-13 08:49:37 |只看作者 |坛友微信交流群
看看!

使用道具

9
oliyiyi 发表于 2015-6-13 08:49:52 |只看作者 |坛友微信交流群
谢谢分享

使用道具

10
li_mao 发表于 2015-6-13 08:59:59 |只看作者 |坛友微信交流群
看看

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 12:53