文科R小白请教群里的技术大牛,我想用R的Rvest包爬取网易新闻的评论(链接http://comment.news.163.com/news_guonei_bbs/5SAOMV780001124J.html)。参照了网上的几篇帖子(http://blog.csdn.net/wshsa/article/details/74157341
http://www.jianshu.com/p/543ce849eef6)
但是按照上面的教程自己写的时候 却老是出错,如下:
> library(rvest)> Link<-'http://comment.news.163.com/news_guonei_bbs/5SAOMV780001124J.html' > Dlink<-read_html(Link) > comment<-html_nodes(Dlink,'#tie-data-4 > div > div > div')> comment{xml_nodeset (0)}> rm(comment)> comment<-Dlink%>%html_nodes('div.list div div')%>%html_text()> commentcharacter(0)> comment<-Dlink%>%html_nodes('div.body div')%>%html_text()> comment [1] "确 定" [2] "\r\n \r\n \r\n \r\n \r\n \r\n " [3] "\r\n \r\n \r\n 您的帐号存在异常操作,为保证您的帐号安全,请输入验证码进行下一步操作。\r\n \r\n 验证码:\r\n 看不清,换一张\r\n \r\n \r\n \r\n " [4] "" [5] "\r\n 您的帐号存在异常操作,为保证您的帐号安全,请输入验证码进行下一步操作。\r\n \r\n 验证码:\r\n 看不清,换一张\r\n \r\n \r\n " [6] "\r\n 确 定\r\n \r\n 取 消\r\n " [7] "\r\n \r\n 分享成功\r\n \r\n \r\n 帐号绑定已经过期\r\n \r\n \r\n 请重新绑定>>\r\n \r\n \r\n " [8] "\r\n \r\n 帐号绑定已经过期\r\n \r\n \r\n 请重新绑定>>\r\n \r\n " [9] "\r\n 打开邀请函\r\n " [10] "\r\n 继续\r\n 取消\r\n "
我怀疑是定位的CSS selector出了问题。请问如果想要获取评论页面的“用户名” “评论内容” “点赞数”,具体定位的CSS selector应该怎么写? 期待各位大牛的回复,谢谢!


雷达卡


京公网安备 11010802022788号







