| 所在主题: | |
| 文件名: 1.xlsx | |
| 资料下载链接地址: https://bbs.pinggu.org/a-2243158.html | |
| 附件大小: | |
|
情感分析清洗数据的时候,有好几处报度量数目不对,小白一个,之前也没有学过R,请教各位大神,到底是什么原因
> train<- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\1.csv",quote = "",sep = "\"", header = F,col.names = 'msg', stringsAsFactors = F) > neg <- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\neg.csv", header = F, sep = ",", stringsAsFactors = F) > weight <- rep(-1, length(neg[,1])) > neg <- cbind(neg, weight) > pos <- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\pos.csv", header = F, sep = ",", stringsAsFactors = F) > weight <- rep(1, length(pos[,1])) > pos <- cbind(pos, weight) > posneg <- rbind(pos, neg) > names(posneg) <- c("term", "weight") > posneg <- posneg[!duplicated(posneg$term), ] > dict <- posneg[, "term"] > library(Rwordseg) > sentence <- as.vector(train$msg) > sentence <- gsub("[[:digit:]]*", "", sentence) > sentence <- gsub("[a-zA-Z]", "", sentence) > sentence <- gsub("\\.", "", sentence) > train<- train[!is.na(sentence), ] > sentence <- sentence[!is.na(sentence)] >train <- train[!nchar(sentence) < 2, ] #老师这里说量度数目不对,我实在找不到问题是怎么回事了 >sentence <- sentence[!nchar(sentence) < 2] >system.time(x <- segmentCN(strwords = sentence)) > temp <- lapply(x, length) > temp <- unlist(temp) > id <- rep(train[, "id"], temp) #这里也说量度数目不对 > label <- rep(train[, "label"], temp) #这里也是说量度数目不对 > term <- unlist(x) > testterm <- as.data.frame(cbind(id, term, label), stringsAsFactors = F) >stopword <- read.csv("C:\\Users\\Administrator\\Desktop\\新建文件夹\\stopword.csv", header = F, sep = ",", stringsAsFactors = F) > stopword <- stopword[!stopword$term %in% posneg$term,] > testterm <- testterm[!testterm$term %in% stopword,] |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明