Data Science with R Association Rules

0关注
62粉丝

VIP

已卖：4196份资源

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 50294 个
通用积分: 83.8106
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2015-6-13 07:55:42 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Data Science with R Association Rules

Association analysis defined Data Mining at its roots in 1989 and during the 1990s. It remains
one of the preeminent techniques for modelling big data and so remains a core tool for the data
scientist’s toolbox.

As an unsupervised learning technique it has delivered considerable benefit in areas ranging from the traditional shopping basket analysis to the analysis of who bought what other books or who watched what other videos, and in areas including health care, telecommunications, and so on. Often for any data mining project we might usually begin with association analysis to identify issues with our data and then to build multiple local models. The analysis aims to identify patterns that are linked by some commonality (such as by a common person).

In this chapter we review association analysis and will discover new insights into our data through
the building of association rule models.
The required packages for this module include:

library(arules) # Association rules.
library(dplyr) # Data munging: tbl_df(), %>%.

As we work through this chapter, new R commands will be introduced. Be sure to review the command’s documentation and understand what the command does. You can ask for help using the ? command as in:
?read.csv
We can obtain documentation on a particular package using the help= option of library():
library(help=rattle)

This chapter is intended to be hands on. To learn effectively, you are encouraged to have R running (e.g., RStudio) and to run all the commands as they appear here. Check that you get the same output, and you understand the output. Try some variations. Explore

本帖隐藏的内容

Data Science with R Association Rules.rar (231.53 KB)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏4 回帖

关键词：Data Science Association Science Rules ATION techniques including learning shopping benefit

本帖被以下文库推荐

· Data Science NewOccidental|主题: 1233, 订阅: 120
· 2万+全球顶级名校/投行英文文献 |主题: 21710, 订阅: 2698

沙发

Lisrelchen 发表于 2015-6-13 07:57:09

## ----module, echo=FALSE, results="asis"----------------------------------
Module <- "ARulesO"
cat(paste0("\\newcommand{\\Module}{", Module, "}"))
## ----setup, child="mycourse.Rnw"-----------------------------------------
## ----setup_options, include=FALSE----------------------------------------
library(knitr)
library(xtable)
opts_chunk$set(cache=FALSE)
opts_chunk$set(out.width='0.8\\textwidth')
opts_chunk$set(fig.align='center')
opts_chunk$set(src.top=NULL)
opts_chunk$set(src.bot=NULL)
opts_chunk$set(out.lines=4)
opts_chunk$set(out.truncate=80)
opts_chunk$set(fig.path=sprintf("figures/%s/", Module))
opts_chunk$set(cache.path=sprintf("cache/%s/", Module))
opts_chunk$set(bib.file=paste0(Module, ".bib"))
# Leave code as I have formatted it.
opts_chunk$set(tidy=FALSE)
# Hooks
# Allow auto crop of base graphics plots when crop=TRUE.
knit_hooks$set(crop=hook_pdfcrop)
# Truncate long lines and long output
hook_output <- knit_hooks$get("output")
hook_source <- knit_hooks$get("source")
knit_hooks$set(output=function(x, options)
{
if (options$results != "asis")
{
# Split string into separate lines.
x <- unlist(stringr::str_split(x, "\n"))
# Trim to the number of lines specified.
if (!is.null(n <- options$out.lines))
{
if (length(x) > n)
{
# Truncate the output.
x <- c(head(x, n), "....\n")
}
}
# Truncate each line to length specified.
if (!is.null(m <- options$out.truncate))
{
len <- nchar(x)
x[len>m] <- paste0(substr(x[len>m], 0, m-3), "...")
}
# Paste lines back together.
x <- paste(x, collapse="\n")
# Replace ' = ' with '=' - my preference. Hopefully won't
# affect things inappropriately.
x <- gsub(" = ", "=", x)
}
hook_output(x, options)
},
source=function(x, options)
{
# Split string into separate lines.
x <- unlist(stringr::str_split(x, "\n"))
# Trim to the number of lines specified.
if (!is.null(n <- options$src.top))
{
if (length(x) > n)
{
# Truncate the output.
if (is.null(m <-options$src.bot)) m <- 0
x <- c(head(x, n+1), "\n....\n", tail(x, m+2))
}
}
# Paste lines back together.
x <- paste(x, collapse="\n")
hook_source(x, options)
})
# Optionally allow R Code chunks to be environments so we can refer to them.
knit_hooks$set(rcode=function(before, options, envir)
{
if (before)
sprintf('\\begin{rcode}\\label{%s}\\hfill{}', options$label)
else
'\\end{rcode}'
})
## ----load_pacakges, message=FALSE----------------------------------------
library(arules) # Association rules.
library(dplyr) # Data munging: tbl_df(), %>%.
## ----additional_dependent_pacakges, echo=FALSE, message=FALSE------------
# These are dependencies that would otherwise be loaded as required.
library(magrittr)
## ----documentation, child="documentation.Rnw", eval=TRUE-----------------
## ----help_library, eval=FALSE, tidy=FALSE--------------------------------
## ?read.csv
## ----help_package, eval=FALSE--------------------------------------------
## library(help=rattle)
## ----record_start_time, echo=FALSE---------------------------------------
start.time <- proc.time()
## ----generate_bib, echo=FALSE, message=FALSE, warning=FALSE--------------
# Write all packages in the current session to a bib file
if (is.null(opts_chunk$get("bib.file"))) opts_chunk$set(bib.file="Course.bib")
write_bib(sub("^.*/", "", grep("^/", searchpaths(), value=TRUE)),
file=opts_chunk$get("bib.file"))
system(paste("cat extra.bib >>", opts_chunk$get("bib.file")))
# Fix up specific issues.
# R-earth
system(paste("perl -pi -e 's|. Derived from .*$|},|'",
opts_chunk$get("bib.file")))
# R-randomForest
system(paste("perl -pi -e 's|Fortran original by Leo Breiman",
"and Adele Cutler and R port by|Leo Breiman and",
"Adele Cutler and|'", opts_chunk$get("bib.file")))
# R-C50
system(paste("perl -pi -e 's|. C code for C5.0 by R. Quinlan|",
" and J. Ross Quinlan|'", opts_chunk$get("bib.file")))
# R-caret
system(paste("perl -pi -e 's|. Contributions from|",
" and|'", opts_chunk$get("bib.file")))
# Me
system(paste("perl -pi -e 's|Graham Williams|",
"Graham J Williams|'", opts_chunk$get("bib.file")))
## ----eval=FALSE----------------------------------------------------------
## fname <- "http://www.biz.uiowa.edu/faculty/jledolter/DataMining/lastfm.csv"
## lastfm <- read.csv(fname, stringsAsFactors=FALSE)
## ----echo=FALSE, eval=FALSE----------------------------------------------
## save(lastfm, file="data/lastfm.RData")
## ----lastfm_load_dataset, echo=FALSE-------------------------------------
load("data/lastfm.RData")
## ----lastfm_summary, out.lines=NULL--------------------------------------
dsname <- "lastfm"
ds <- get(dsname) %>% tbl_df()
ds
## ----lastfm_prepare_dataset, out.lines=NULL------------------------------
ds <- ds %>% select(user, artist) %>% unique()
ds
## ----lastfm_as_transactions----------------------------------------------
library(arules)
trans <- as(split(ds$artist, ds$user), "transactions")
## ----lastfm_inspect_trans, out.lines=8-----------------------------------
inspect(trans[1:5])
## ----lastfm_plot_frequency, fig.height=5---------------------------------
itemFrequencyPlot(trans, support=0.075)
## ----out.lines=NULL------------------------------------------------------
model <- apriori(trans, parameter=list(support=0.01, confidence=0.5))
## ----out.lines=10--------------------------------------------------------
inspect(model)
## ----out.lines=10--------------------------------------------------------
inspect(subset(model, subset=lift>8))
inspect(sort(subset(model, subset=lift>8), by="confidence"))
## ----constants_baskets, echo=FALSE---------------------------------------
set.seed(42)
nb <- 10 # Number of baskets.
ni <- 5 # Number of items.
nc <- 40 # Number of combinations.
## ----constants_baskets, eval=FALSE---------------------------------------
## set.seed(42)
## nb <- 10 # Number of baskets.
## ni <- 5 # Number of items.
## nc <- 40 # Number of combinations.
## ----random_basket_dataset-----------------------------------------------
ds <- data.frame(id=sort(sprintf("b%02d", sample(1:nb, nc, replace=TRUE))),
item=sprintf("i%1d", sample(1:ni, nc, replace=TRUE)))
ds <- unique(ds)
rownames(ds) <- NULL
## ----summary_basket_sizes, out.lines=6-----------------------------------
ds %>% group_by(id) %>% tally()
## ----list_basket_contents, out.lines=NULL--------------------------------
ds %>% group_by(id) %>% summarise(items=paste(sort(item), collapse=", "))
## ----list_baskets_with-i1, out.lines=NULL--------------------------------
ds %>% group_by(id) %>% summarise(i1="i1" %in% item) %>% filter(i1)
## ----one_itemset_freq, out.lines=NULL------------------------------------
ds %>% group_by(item) %>% tally()
## ----one_itemset_support, out.lines=NULL---------------------------------
ds %>% group_by(item) %>% tally() %>% mutate(s=n/nb)
## ----arules_create_dst---------------------------------------------------
library(arules)
dst <- as(split(ds$item, ds$id), "transactions")
dst
## ----dst_item_frequency--------------------------------------------------
itemFrequency(dst)
## ------------------------------------------------------------------------
itemFrequency(dst, type="absolute")
## ----dst_plot_freq, fig.height=3.5---------------------------------------
itemFrequencyPlot(dst)
## ----echo=FALSE----------------------------------------------------------
is2.freq <- group_by(ds,id) %>%
summarise(is.1.2="i1" %in% item & "i2" %in% item) %>%
tally(is.1.2)
is3.freq <- group_by(ds,id) %>%
summarise(is.1.2.3="i1" %in% item &
"i2" %in% item &
"i3" %in% item) %>%
tally(is.1.2.3)
## ----out.lines=NULL------------------------------------------------------
merge(ds, ds, by="id") %>%
subset(as.character(item.x) < as.character(item.y)) %>%
mutate(itemset=paste(item.x, item.y)) %>%
group_by(itemset) %>%
tally()
## ----out.lines=NULL------------------------------------------------------
merge(ds, ds, by="id") %>%
merge(ds, by="id") %>%
subset(as.character(item.x) < as.character(item.y) &
as.character(item.y) < as.character(item)) %>%
mutate(itemset=paste(item.x, item.y, item)) %>%
group_by(itemset) %>%
tally()
## ----common_outtro, child="finale.Rnw", eval=TRUE------------------------
## ----syinfo, child="sysinfo.Rnw", eval=TRUE------------------------------
## ----echo=FALSE, message=FALSE-------------------------------------------
require(Hmisc)
pkg <- "knitr"
pkg.version <- installed.packages()[pkg, 'Version']
pkg.date <- installed.packages(fields="Date")[pkg, 'Date']
pkg.info <- paste(pkg, pkg.version, pkg.date)
rev <- system("bzr revno", intern=TRUE)
cpu <- system(paste("cat /proc/cpuinfo | grep 'model name' |",
"head -n 1 | cut -d':' -f2"), intern=TRUE)
ram <- system("cat /proc/meminfo | grep MemTotal: | awk '{print $2}'",
intern=TRUE)
ram <- paste0(round(as.integer(ram)/1e6, 1), "GB")
user <- Sys.getenv("LOGNAME")
node <- Sys.info()[["nodename"]]
user.node <- paste0(user, "@", node)
gcc.version <- system("g++ -v 2>&1 | grep 'gcc version' | cut -d' ' -f1-3",
intern=TRUE)
os <- system("lsb_release -d | cut -d: -f2 | sed 's/^[ \t]*//'", intern=TRUE)

复制代码

藤椅

Elena3 发表于 2015-6-13 08:00:47

板凳

hhbb979

发表于 2015-6-13 08:09:02

好样的！

报纸

fengyg

发表于 2015-6-13 08:31:14

kankan

地板

lhf8059 发表于 2015-6-13 08:52:17

看看！

7楼

YONGHU33 发表于 2015-6-13 11:07:50

看看，谢谢！很实用

加关注串个门加好友发消息 6关注 76粉丝禁止访问 auirzxp 当前离线阅读权限 0 威望 1 级论坛币 229692 个通用积分 25371.2470 学术水平 4223 点热心指数 4861 点信用等级 4173 点经验 4493 点帖子 13491 精华 0 在线时间 12559 小时注册时间 2007-1-3 最后登录 2024-4-8 雷达卡	8楼 auirzxp 发表于 2015-6-13 11:08:30 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

9楼

ohmymamami 发表于 2015-6-13 18:23:03

厉害厉害

10楼

nieqiang110

发表于 2015-6-13 19:38:25

ccccccccccccccccccccccccccccccccccc

Data Science with R Association Rules [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级热心勋章

中级热心勋章

高级热心勋章

特级热心勋章

初级信用勋章

本版微信群

Data Science with R Association Rules [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级热心勋章

中级热心勋章

高级热心勋章

特级热心勋章

初级信用勋章

本版微信群

扫码加我拉你入群