运行不了啊。。。。。{:3_44:}
> CMetaData(ovid)
错误: 没有"CMetaData"这个函数
> summary(ovid)
Length Class Mode
荷兰队长上演惊天远射.txt 2 PlainTextDocument list
技术化转型路上德国人受重创.txt 2 PlainTextDocument list
普约尔贡献头球绝杀.txt 2 PlainTextDocument list
四大天王沉沦各有难念的经.txt 2 PlainTextDocument list
再战德班德西命运迥异.txt 2 PlainTextDocument list
> DMetaData(ovid)
错误: 没有"DMetaData"这个函数
>
>
> ### 去掉多余空格 ####
> reuters <- tm_map(ovid, stripWhitespace)
> reuters[[1]][[5]]
错误于reuters[[1]][[5]] : 下标出界
>
> ### 中文分词 ###
> zj <- c("true")
> re <- 0
> for (i in 1:dt$Length) {
+ re[[i]]<- zwfc(PlainTextDocument(reuters)[[i]],zj)
+ }
错误于1:dt$Length : 参数长度为零
> ### 生成新的文集 ###
> reuters <- Corpus(VectorSource(re))
> meta(reuters[[2]])
错误于x$content[[i]] : 下标出界
>
> ### 元数据管理 ###
> DublinCore(reuters[[2]], "title") <- "技术化转型路上德国人受重创"
错误于x$content[[i]] : 下标出界
> meta(reuters[[2]])
错误于x$content[[i]] : 下标出界
>
> ### 创建词条-文件矩阵
>
> dtm <- DocumentTermMatrix(reuters)
> inspect(dtm[1:5, 8:13])
错误于`[.simple_triplet_matrix`(dtm, 1:5, 8:13) :
subscript out of bounds
>
> ## 操作词条-文件矩阵 ##
> ## 1、找出最少出现过5次的词条 ##
> findFreqTerms(dtm, 5)
NULL
>
> ## 2、找出与"西班牙"相关度到少达0.8的词条 ###
> findAssocs(dtm, "西班牙", 0.8)
$西班牙
numeric(0)
>
> ### 去掉较少词频(40%以下)的词条后 ####
> inspect(removeSparseTerms(dtm, 0.4))
<<DocumentTermMatrix (documents: 1, terms: 0)>>
Non-/sparse entries: 0/0
Sparsity : 100%
Maximal term length: 0
Weighting : term frequency (tf)
Terms
Docs
1
>
> ### 词典 ### 它通常用来表示文本挖掘有关词条
>
> (d <- Dictionary(c("世界杯", "半决赛", "西班牙")))
错误: 没有"Dictionary"这个函数
>
> inspect(DocumentTermMatrix(reuters, list(dictionary = d)))
错误于stopifnot(is.list(control)) : 找不到对象'd'
>
>
> ## 根据词条频率对文件进行聚类分析 ##
>
> reHClust <- hclust(dist(dtm), method = "ward")
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
错误于hclust(dist(dtm), method = "ward") : 用群集时必需有n >= 2的对象
> plot(reHClust,main ="文件聚类分析")
错误于plot(reHClust, main = "文件聚类分析") : 找不到对象'reHClust'
> ## 对词条进行分类 ###
> kmeans(dtm, 3)
错误于sample.int(m, k) : 'replace = FALSE',因此不能取比总体要大的样本
>
> ### 主成分分析 ###
>
> ozMat <- TermDocumentMatrix(makeChunks(reuters, 50),
+ list(weighting = weightBin))
错误于TermDocumentMatrix(makeChunks(reuters, 50), list(weighting = weightBin)) :
没有"makeChunks"这个函数
>
> k <- princomp(as.matrix(ozMat), features = 2)
错误于as.matrix(ozMat) : 找不到对象'ozMat'
> windows()
> screeplot(k,npcs=6,type='lines')
错误于screeplot(k, npcs = 6, type = "lines") : 找不到对象'k'
> windows()
> biplot(k)
错误于biplot(k) : 找不到对象'k'
>
> ### 对词条进行聚类分析 ####
> ozHClust <- hclust(dist(ozMat), method = "ward")
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
错误于as.matrix(x) : 找不到对象'ozMat'
> windows()
> plot(ozHClust,main="词条聚类分析")
错误于plot(ozHClust, main = "词条聚类分析") : 找不到对象'ozHClust'
>
>
>
|