这里介绍一个和网页交互,并抓取数据的方法
(比如在http://ccpl.psych.ac.cn/textmind/,分析800个文本,并抓取数据,网页提供的软件运行起来会报错,否则我也不至于写这个了)
win10,64bit
firefox版本45,selenium2.53配合使用
chrome 版本47,selenium3.3.1配合使用
(现在的大家的火狐估计都比45高了,可以测试使用高版本的selenium)
运行selenium要有java的环境(我这里是java8)
下载 selenium
http://selenium-release.storage.googleapis.com/index.html
火狐驱动
https://github.com/mozilla/geckodriver/releases
chrome 驱动
# https://sites.google.com/a/chromium.org/chromedriver/downloads
上述三个文件,路径随意,但是驱动必须要在环境变量path中
先启动selenium
1.在cmd下,用cd命令切换到selenium的目录(打开cmd,默认路径在c盘,如果想切换到其他盘需要用/d 选型,比如 cd /d e:/downloads(不区分大小写)
2.java -jar selenium-xxx.jar
启动R
- library(RSelenium)
- library(XML)
- library(httr)
数据提供20条(多了也没意思)
如果有不对的地方,尽管指正,轻喷,谢谢。
- t_affinia_7.txt
- t_affinia_8.txt
- t_affinia_9.txt
- t_affinia_10.txt
- t_affinia_11.txt
- t_affinia_12.txt
- t_affinia_13.txt
- t_affinia_14.txt
- t_affinia_15.txt
- t_affinia_16.txt
- t_affinia_17.txt
- t_affinia_18.txt
- t_affinia_19.txt
- t_affinia_20.txt
- t_affinia_1.txt
- t_affinia_2.txt
- t_affinia_3.txt
- t_affinia_4.txt
- t_affinia_5.txt
- t_affinia_6.txt
reference
http://yangdan1988.blog.51cto.com/6983723/1205237
https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-basics.html#sending-mouse-events-to-elements
http://www.cnblogs.com/Yiutto/p/6238946.html
http://www.computerworld.com/article/2971265/application-development/how-to-drive-a-web-browser-with-r-and-rselenium.html#RSeleniumChart
https://www.youtube.com/watch?v=PYy5C9IIgp8
http://m.blog.csdn.net/article/details?id=46917159
https://www.ibm.com/developerworks/cn/java/j-lo-keyboard/