[R] R语言中文文本挖掘小例子及程序 [推广有奖]

41楼

神释 发表于 2014-7-23 23:21:54 |只看作者 |坛友微信交流群

R语言文本挖掘

使用道具举报

42楼

夏木666888

发表于 2014-8-26 15:53:07 |只看作者 |坛友微信交流群

太贵啦~

使用道具举报

43楼

神释 发表于 2014-10-7 20:44:35 |只看作者 |坛友微信交流群

LZ 不可以便宜点吗这个也太贵了吧

买不起呢

使用道具举报

44楼

1120311766 发表于 2014-12-13 15:25:45 |只看作者 |坛友微信交流群

兄弟论坛币上万了，还收这么多钱做设么？

使用道具举报

45楼

叶寒 发表于 2015-1-4 11:28:19 |只看作者 |坛友微信交流群

论坛币被扣了，可是没有发现附件呢，楼主，解释下吧

使用道具举报

46楼

a524631266

发表于 2015-2-10 00:22:03 |只看作者 |坛友微信交流群

这个的确太贵了

使用道具举报

47楼

1124518271 发表于 2015-4-13 12:36:59 |只看作者 |坛友微信交流群

运行不了啊。。。。。{:3_44:}

> CMetaData(ovid)
错误: 没有"CMetaData"这个函数
> summary(ovid)
                           Length Class          Mode
荷兰队长上演惊天远射.txt    2    PlainTextDocument list
技术化转型路上德国人受重创.txt 2    PlainTextDocument list
普约尔贡献头球绝杀.txt       2    PlainTextDocument list
四大天王沉沦各有难念的经.txt 2    PlainTextDocument list
再战德班德西命运迥异.txt    2    PlainTextDocument list
> DMetaData(ovid)
错误: 没有"DMetaData"这个函数
>
>
> ###  去掉多余空格  ####
> reuters <- tm_map(ovid, stripWhitespace)
> reuters[[1]][[5]]
错误于reuters[[1]][[5]] : 下标出界
>
> ###  中文分词  ###
> zj <- c("true")
> re <- 0
> for (i in 1:dt$Length) {
+    re[[i]]<-  zwfc(PlainTextDocument(reuters)[[i]],zj)
+    }
错误于1:dt$Length : 参数长度为零
> ###  生成新的文集  ###
> reuters <- Corpus(VectorSource(re))
> meta(reuters[[2]])
错误于x$content[[i]] : 下标出界
>
> ###  元数据管理  ###
> DublinCore(reuters[[2]], "title") <- "技术化转型路上德国人受重创"
错误于x$content[[i]] : 下标出界
> meta(reuters[[2]])
错误于x$content[[i]] : 下标出界
>
> ###  创建词条-文件矩阵
>
> dtm <- DocumentTermMatrix(reuters)
> inspect(dtm[1:5, 8:13])
错误于`[.simple_triplet_matrix`(dtm, 1:5, 8:13) :
  subscript out of bounds
>
> ##  操作词条-文件矩阵  ##
> ##  1、找出最少出现过5次的词条  ##
> findFreqTerms(dtm, 5)
NULL
>
> ##  2、找出与"西班牙"相关度到少达0.8的词条  ###
> findAssocs(dtm, "西班牙", 0.8)
$西班牙
numeric(0)

>
> ###  去掉较少词频（40%以下）的词条后  ####
> inspect(removeSparseTerms(dtm, 0.4))
<<DocumentTermMatrix (documents: 1, terms: 0)>>
Non-/sparse entries: 0/0
Sparsity          : 100%
Maximal term length: 0
Weighting       : term frequency (tf)

Terms
Docs
1
>
> ###  词典  ###  它通常用来表示文本挖掘有关词条
>
> (d <- Dictionary(c("世界杯", "半决赛", "西班牙")))
错误: 没有"Dictionary"这个函数
>
> inspect(DocumentTermMatrix(reuters, list(dictionary = d)))
错误于stopifnot(is.list(control)) : 找不到对象'd'
>
>
> ##  根据词条频率对文件进行聚类分析  ##
>
> reHClust <- hclust(dist(dtm), method = "ward")
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
错误于hclust(dist(dtm), method = "ward") : 用群集时必需有n >= 2的对象
> plot(reHClust,main ="文件聚类分析")
错误于plot(reHClust, main = "文件聚类分析") : 找不到对象'reHClust'
> ##  对词条进行分类  ###
> kmeans(dtm, 3)
错误于sample.int(m, k) : 'replace = FALSE'，因此不能取比总体要大的样本

>
> ### 主成分分析  ###
>
> ozMat <- TermDocumentMatrix(makeChunks(reuters, 50),
+    list(weighting = weightBin))
错误于TermDocumentMatrix(makeChunks(reuters, 50), list(weighting = weightBin)) :
  没有"makeChunks"这个函数
>
> k <- princomp(as.matrix(ozMat), features = 2)
错误于as.matrix(ozMat) : 找不到对象'ozMat'
> windows()
> screeplot(k,npcs=6,type='lines')
错误于screeplot(k, npcs = 6, type = "lines") : 找不到对象'k'
> windows()
>  biplot(k)
错误于biplot(k) : 找不到对象'k'
>
> ### 对词条进行聚类分析  ####
> ozHClust <- hclust(dist(ozMat), method = "ward")
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
错误于as.matrix(x) : 找不到对象'ozMat'
> windows()
> plot(ozHClust,main="词条聚类分析")
错误于plot(ozHClust, main = "词条聚类分析") : 找不到对象'ozHClust'
>
>
>

使用道具举报