关键是这个ajax动态加载我不理解,如果只是抓取一页的评论,是可以的,下面是我的做法。
library(RCurl)
library(XML)
library(plyr)
#伪造请求报头
myheader=c("User-Agent"="Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.6) ",
"Accept"="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language"="en-us",
"Connection"="keep-alive",
"Accept-Charset"="GB2312,utf-8;q=0.7,*;q=0.7"
)
webpage = getURL('https://item.jd.com/12107414.html#comments-list',httpheader=myheader,.encoding='utf-8')
pagetree = htmlParse(webpage,encoding='utf-8')
comment = xpathSApply(pagetree,"//div[@class='comment-content']",xmlValue)
comment = iconv(comment,"utf-8","LATIN1")
comment
问题很明显,我只能得到10条评论。怎么获取全部评论,求解答(有例子更好)