大部分网站都能直接爬取信息的,但天猫的总是显示错误,例如rvest包
> library(rvest)
> url <- "http://detail.tmall.com/item.htm?id=43502289962"
> html(url)
Error in curl::curl_fetch_memory(url, handle = handle) :
SSL connect error
例如XML包
> library(XML)
> url <- "http://detail.tmall.com/item.htm?id=43502289962"
> htmlParse(url)
Error: failed to load external entity "http://detail.tmall.com/item.htm?id=43502289962"
例如RCurl
> library(RCurl)
> url <- "http://detail.tmall.com/item.htm?id=43502289962"
> getURL(url)
[1] "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\r\n<html>\r\n<head><title>302 Found</title></head>\r\n<body bgcolor=\"white\">\r\n<h1>302 Found</h1>\r\n<p>The requested resource resides temporarily under a different URI.</p>\r\n<hr/>Powered by Tengine</body>\r\n</html>\r\n"
倒是有结果,但是不是正经的网页源码,太短了
只有windows下有问题,mac和linux都没问题
不想用别的软件,求问用R有办法解决吗?


雷达卡






京公网安备 11010802022788号







