搜索
人大经济论坛 附件下载

附件下载

所在主题:
文件名:  1.xlsx
资料下载链接地址: https://bbs.pinggu.org/a-2243158.html
附件大小:
157.82 KB   举报本内容
情感分析清洗数据的时候,有好几处报度量数目不对,小白一个,之前也没有学过R,请教各位大神,到底是什么原因

> train<- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\1.csv",quote = "",sep = "\"", header = F,col.names = 'msg', stringsAsFactors = F)
> neg <- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\neg.csv", header = F, sep = ",", stringsAsFactors = F)
> weight <- rep(-1, length(neg[,1]))
> neg <- cbind(neg, weight)
> pos <- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\pos.csv", header = F, sep = ",", stringsAsFactors = F)
> weight <- rep(1, length(pos[,1]))
> pos <- cbind(pos, weight)
> posneg <- rbind(pos, neg)
> names(posneg) <- c("term", "weight")
> posneg <- posneg[!duplicated(posneg$term), ]
> dict <- posneg[, "term"]
> library(Rwordseg)
> sentence <- as.vector(train$msg)
> sentence <- gsub("[[:digit:]]*", "", sentence)
> sentence <- gsub("[a-zA-Z]", "", sentence)
> sentence <- gsub("\\.", "", sentence)
> train<- train[!is.na(sentence), ]
> sentence <- sentence[!is.na(sentence)]

>train <- train[!nchar(sentence) < 2, ] #老师这里说量度数目不对,我实在找不到问题是怎么回事了
>sentence <- sentence[!nchar(sentence) < 2]
>system.time(x <- segmentCN(strwords = sentence))
> temp <- lapply(x, length)
> temp <- unlist(temp)
> id <- rep(train[, "id"], temp) #这里也说量度数目不对
> label <- rep(train[, "label"], temp) #这里也是说量度数目不对
> term <- unlist(x)
> testterm <- as.data.frame(cbind(id, term, label), stringsAsFactors = F)
>stopword <- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\stopword.csv", header = F, sep = ",", stringsAsFactors = F)
> stopword <- stopword[!stopword$term %in% posneg$term,]
> testterm <- testterm[!testterm$term %in% stopword,]



    熟悉论坛请点击新手指南
下载说明
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。
2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。
3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。
(如有侵权,欢迎举报)
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

GMT+8, 2025-12-31 19:07