楼主: Lisrelchen
2491 12

Data Science with R Association Rules [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2015-6-13 07:55:42 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Data Science with R Association Rules


Association analysis defined Data Mining at its roots in 1989 and during the 1990s. It remains
one of the preeminent techniques for modelling big data and so remains a core tool for the data
scientist’s toolbox.

As an unsupervised learning technique it has delivered considerable benefit in areas ranging from the traditional shopping basket analysis to the analysis of who bought what other books or who watched what other videos, and in areas including health care, telecommunications, and so on. Often for any data mining project we might usually begin with association analysis to identify issues with our data and then to build multiple local models. The analysis aims to identify patterns that are linked by some commonality (such as by a common person).


In this chapter we review association analysis and will discover new insights into our data through
the building of association rule models.
The required packages for this module include:

  • library(arules) # Association rules.
  • library(dplyr) # Data munging: tbl_df(), %>%.



As we work through this chapter, new R commands will be introduced. Be sure to review the command’s documentation and understand what the command does. You can ask for help using the ? command as in:
?read.csv
We can obtain documentation on a particular package using the help= option of library():
library(help=rattle)

This chapter is intended to be hands on. To learn effectively, you are encouraged to have R running (e.g., RStudio) and to run all the commands as they appear here. Check that you get the same output, and you understand the output. Try some variations. Explore


本帖隐藏的内容

Data Science with R Association Rules.rar (231.53 KB)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Association Science Rules ATION techniques including learning shopping benefit

已有 1 人评分论坛币 学术水平 热心指数 信用等级 收起 理由
crystal8832 + 50 + 3 + 3 + 3 奖励积极上传好的资料

总评分: 论坛币 + 50  学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2015-6-13 07:57:09
  1. ## ----module, echo=FALSE, results="asis"----------------------------------
  2. Module <- "ARulesO"
  3. cat(paste0("\\newcommand{\\Module}{", Module, "}"))


  4. ## ----setup, child="mycourse.Rnw"-----------------------------------------

  5. ## ----setup_options, include=FALSE----------------------------------------
  6. library(knitr)
  7. library(xtable)

  8. opts_chunk$set(cache=FALSE)

  9. opts_chunk$set(out.width='0.8\\textwidth')
  10. opts_chunk$set(fig.align='center')

  11. opts_chunk$set(src.top=NULL)
  12. opts_chunk$set(src.bot=NULL)
  13. opts_chunk$set(out.lines=4)
  14. opts_chunk$set(out.truncate=80)

  15. opts_chunk$set(fig.path=sprintf("figures/%s/", Module))
  16. opts_chunk$set(cache.path=sprintf("cache/%s/", Module))
  17. opts_chunk$set(bib.file=paste0(Module, ".bib"))

  18. # Leave code as I have formatted it.

  19. opts_chunk$set(tidy=FALSE)

  20. # Hooks

  21. # Allow auto crop of base graphics plots when crop=TRUE.

  22. knit_hooks$set(crop=hook_pdfcrop)

  23. # Truncate long lines and long output

  24. hook_output <- knit_hooks$get("output")
  25. hook_source <- knit_hooks$get("source")
  26. knit_hooks$set(output=function(x, options)
  27. {
  28.   if (options$results != "asis")
  29.   {
  30.     # Split string into separate lines.
  31.     x <- unlist(stringr::str_split(x, "\n"))
  32.     # Trim to the number of lines specified.
  33.     if (!is.null(n <- options$out.lines))
  34.     {
  35.       if (length(x) > n)
  36.       {
  37.         # Truncate the output.
  38.         x <- c(head(x, n), "....\n")
  39.       }
  40.     }
  41.     # Truncate each line to length specified.
  42.     if (!is.null(m <- options$out.truncate))
  43.     {
  44.       len <- nchar(x)
  45.       x[len>m] <- paste0(substr(x[len>m], 0, m-3), "...")
  46.     }
  47.     # Paste lines back together.
  48.     x <- paste(x, collapse="\n")
  49.     # Replace ' = ' with '=' - my preference. Hopefully won't
  50.     # affect things inappropriately.
  51.     x <- gsub(" = ", "=", x)
  52.   }
  53.   hook_output(x, options)
  54. },
  55. source=function(x, options)
  56. {
  57.   # Split string into separate lines.
  58.   x <- unlist(stringr::str_split(x, "\n"))
  59.   # Trim to the number of lines specified.
  60.   if (!is.null(n <- options$src.top))
  61.   {
  62.     if (length(x) > n)
  63.     {
  64.       # Truncate the output.
  65.       if (is.null(m <-options$src.bot)) m <- 0
  66.       x <- c(head(x, n+1), "\n....\n", tail(x, m+2))
  67.    }
  68.   }
  69.   # Paste lines back together.
  70.   x <- paste(x, collapse="\n")
  71.   hook_source(x, options)
  72. })

  73. # Optionally allow R Code chunks to be environments so we can refer to them.

  74. knit_hooks$set(rcode=function(before, options, envir)
  75. {
  76.   if (before)
  77.     sprintf('\\begin{rcode}\\label{%s}\\hfill{}', options$label)
  78.   else
  79.     '\\end{rcode}'
  80. })



  81. ## ----load_pacakges, message=FALSE----------------------------------------
  82. library(arules)                # Association rules.
  83. library(dplyr)          # Data munging: tbl_df(), %>%.


  84. ## ----additional_dependent_pacakges, echo=FALSE, message=FALSE------------

  85. # These are dependencies that would otherwise be loaded as required.

  86. library(magrittr)


  87. ## ----documentation, child="documentation.Rnw", eval=TRUE-----------------


  88. ## ----help_library, eval=FALSE, tidy=FALSE--------------------------------
  89. ## ?read.csv


  90. ## ----help_package, eval=FALSE--------------------------------------------
  91. ## library(help=rattle)


  92. ## ----record_start_time, echo=FALSE---------------------------------------
  93. start.time <- proc.time()


  94. ## ----generate_bib, echo=FALSE, message=FALSE, warning=FALSE--------------
  95. # Write all packages in the current session to a bib file
  96. if (is.null(opts_chunk$get("bib.file"))) opts_chunk$set(bib.file="Course.bib")
  97. write_bib(sub("^.*/", "", grep("^/", searchpaths(), value=TRUE)),
  98.           file=opts_chunk$get("bib.file"))
  99. system(paste("cat extra.bib >>", opts_chunk$get("bib.file")))
  100. # Fix up specific issues.
  101. # R-earth
  102. system(paste("perl -pi -e 's|. Derived from .*$|},|'",
  103.              opts_chunk$get("bib.file")))
  104. # R-randomForest
  105. system(paste("perl -pi -e 's|Fortran original by Leo Breiman",
  106.              "and Adele Cutler and R port by|Leo Breiman and",
  107.              "Adele Cutler and|'", opts_chunk$get("bib.file")))
  108. # R-C50
  109. system(paste("perl -pi -e 's|. C code for C5.0 by R. Quinlan|",
  110.              " and J. Ross Quinlan|'", opts_chunk$get("bib.file")))
  111. # R-caret
  112. system(paste("perl -pi -e 's|. Contributions from|",
  113.              " and|'", opts_chunk$get("bib.file")))
  114. # Me
  115. system(paste("perl -pi -e 's|Graham Williams|",
  116.              "Graham J Williams|'", opts_chunk$get("bib.file")))




  117. ## ----eval=FALSE----------------------------------------------------------
  118. ## fname       <- "http://www.biz.uiowa.edu/faculty/jledolter/DataMining/lastfm.csv"
  119. ## lastfm      <- read.csv(fname, stringsAsFactors=FALSE)


  120. ## ----echo=FALSE, eval=FALSE----------------------------------------------
  121. ## save(lastfm, file="data/lastfm.RData")


  122. ## ----lastfm_load_dataset, echo=FALSE-------------------------------------
  123. load("data/lastfm.RData")


  124. ## ----lastfm_summary, out.lines=NULL--------------------------------------
  125. dsname      <- "lastfm"
  126. ds          <- get(dsname) %>% tbl_df()
  127. ds


  128. ## ----lastfm_prepare_dataset, out.lines=NULL------------------------------
  129. ds <- ds %>% select(user, artist) %>% unique()
  130. ds


  131. ## ----lastfm_as_transactions----------------------------------------------
  132. library(arules)

  133. trans <- as(split(ds$artist, ds$user), "transactions")


  134. ## ----lastfm_inspect_trans, out.lines=8-----------------------------------
  135. inspect(trans[1:5])


  136. ## ----lastfm_plot_frequency, fig.height=5---------------------------------
  137. itemFrequencyPlot(trans, support=0.075)


  138. ## ----out.lines=NULL------------------------------------------------------
  139. model <- apriori(trans, parameter=list(support=0.01, confidence=0.5))


  140. ## ----out.lines=10--------------------------------------------------------
  141. inspect(model)


  142. ## ----out.lines=10--------------------------------------------------------
  143. inspect(subset(model, subset=lift>8))
  144. inspect(sort(subset(model, subset=lift>8), by="confidence"))


  145. ## ----constants_baskets, echo=FALSE---------------------------------------
  146. set.seed(42)
  147. nb <- 10    # Number of baskets.
  148. ni <- 5     # Number of items.
  149. nc <- 40    # Number of combinations.


  150. ## ----constants_baskets, eval=FALSE---------------------------------------
  151. ## set.seed(42)
  152. ## nb <- 10    # Number of baskets.
  153. ## ni <- 5     # Number of items.
  154. ## nc <- 40    # Number of combinations.


  155. ## ----random_basket_dataset-----------------------------------------------
  156. ds <- data.frame(id=sort(sprintf("b%02d", sample(1:nb, nc, replace=TRUE))),
  157.                  item=sprintf("i%1d", sample(1:ni, nc, replace=TRUE)))
  158. ds <- unique(ds)
  159. rownames(ds) <- NULL


  160. ## ----summary_basket_sizes, out.lines=6-----------------------------------
  161. ds %>% group_by(id) %>% tally()


  162. ## ----list_basket_contents, out.lines=NULL--------------------------------
  163. ds %>% group_by(id) %>% summarise(items=paste(sort(item), collapse=", "))


  164. ## ----list_baskets_with-i1, out.lines=NULL--------------------------------
  165. ds %>% group_by(id) %>% summarise(i1="i1" %in% item) %>% filter(i1)


  166. ## ----one_itemset_freq, out.lines=NULL------------------------------------
  167. ds %>% group_by(item) %>% tally()


  168. ## ----one_itemset_support, out.lines=NULL---------------------------------
  169. ds %>% group_by(item) %>% tally() %>% mutate(s=n/nb)


  170. ## ----arules_create_dst---------------------------------------------------
  171. library(arules)
  172. dst <- as(split(ds$item, ds$id), "transactions")
  173. dst


  174. ## ----dst_item_frequency--------------------------------------------------
  175. itemFrequency(dst)


  176. ## ------------------------------------------------------------------------
  177. itemFrequency(dst, type="absolute")


  178. ## ----dst_plot_freq, fig.height=3.5---------------------------------------
  179. itemFrequencyPlot(dst)


  180. ## ----echo=FALSE----------------------------------------------------------
  181. is2.freq <- group_by(ds,id) %>%
  182.   summarise(is.1.2="i1" %in% item & "i2" %in% item) %>%
  183.   tally(is.1.2)

  184. is3.freq <- group_by(ds,id) %>%
  185.   summarise(is.1.2.3="i1" %in% item &
  186.             "i2" %in% item &
  187.             "i3" %in% item) %>%
  188.   tally(is.1.2.3)


  189. ## ----out.lines=NULL------------------------------------------------------
  190. merge(ds, ds, by="id") %>%
  191.   subset(as.character(item.x) < as.character(item.y)) %>%
  192.   mutate(itemset=paste(item.x, item.y)) %>%
  193.   group_by(itemset) %>%
  194.   tally()


  195. ## ----out.lines=NULL------------------------------------------------------
  196. merge(ds, ds, by="id") %>%
  197.   merge(ds, by="id") %>%
  198.   subset(as.character(item.x) < as.character(item.y) &
  199.          as.character(item.y) < as.character(item)) %>%
  200.   mutate(itemset=paste(item.x, item.y, item)) %>%
  201.   group_by(itemset) %>%
  202.   tally()


  203. ## ----common_outtro, child="finale.Rnw", eval=TRUE------------------------


  204. ## ----syinfo, child="sysinfo.Rnw", eval=TRUE------------------------------

  205. ## ----echo=FALSE, message=FALSE-------------------------------------------
  206. require(Hmisc)
  207. pkg <- "knitr"
  208. pkg.version <- installed.packages()[pkg, 'Version']
  209. pkg.date <- installed.packages(fields="Date")[pkg, 'Date']
  210. pkg.info <- paste(pkg, pkg.version, pkg.date)

  211. rev <- system("bzr revno", intern=TRUE)
  212. cpu <- system(paste("cat /proc/cpuinfo | grep 'model name' |",
  213.                     "head -n 1 | cut -d':' -f2"), intern=TRUE)
  214. ram <- system("cat /proc/meminfo | grep MemTotal: | awk '{print $2}'",
  215.               intern=TRUE)
  216. ram <- paste0(round(as.integer(ram)/1e6, 1), "GB")
  217. user <- Sys.getenv("LOGNAME")
  218. node <- Sys.info()[["nodename"]]
  219. user.node <- paste0(user, "@", node)
  220. gcc.version <- system("g++ -v 2>&1 | grep 'gcc version' | cut -d' ' -f1-3",
  221.                       intern=TRUE)
  222. os <- system("lsb_release -d | cut -d: -f2 | sed 's/^[ \t]*//'", intern=TRUE)
复制代码

藤椅
Elena3 发表于 2015-6-13 08:00:47

板凳
hhbb979 在职认证  发表于 2015-6-13 08:09:02
好样的!

报纸
fengyg 企业认证  发表于 2015-6-13 08:31:14
kankan

地板
lhf8059 发表于 2015-6-13 08:52:17
看看!

7
YONGHU33 发表于 2015-6-13 11:07:50
看看,谢谢!很实用

8
auirzxp 学生认证  发表于 2015-6-13 11:08:30
提示: 作者被禁止或删除 内容自动屏蔽

9
ohmymamami 发表于 2015-6-13 18:23:03
厉害厉害

10
nieqiang110 学生认证  发表于 2015-6-13 19:38:25
ccccccccccccccccccccccccccccccccccc

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-29 00:18