用Python scrapy爬虫框架编写程序。

6关注
1粉丝

已卖：1417份资源

博士生

53%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 2397 个
通用积分: 4.5700
学术水平: 1 点
热心指数: 1 点
信用等级: 0 点
经验: 5181 点
帖子: 230
精华: 0
在线时间: 236 小时
注册时间: 2007-4-24
最后登录: 2024-9-18

楼主

wuchm 发表于 2015-11-7 18:57:22 |AI写论文

200论坛币

用Python scrapy爬虫框架编写程序。
1、scrapy爬虫框架怎么搭建？
2、搭建成功后，爬取网站：
   a、网站：搜房网（深圳）：http://esf.sz.fang.com/housing/__0_3_0_0_1_0_0/
   b、需要抓取的结果：如下表所示，字段全部抓取出来。
   c、最好对每一步解释下，本人初学者，底子弱，谢谢！

本月均价：38495元/㎡环比上月；↓0.06%
同比去年; ↑44.07%
二手房;2639套
出租房;209套
周边短租房；65套
装修案例; 8套
所在区域；南山科技园
小区地址 ;南山深南大道深圳大学旁物业
电话 :26966292
物业地点；汇景豪苑停车场
物业费：2.80元/平米·月
物业公司：深圳新港物业管理有限公司
建筑年代： 2000-04-15
开发商  ：大中华国际实业（深圳）有限公司

最佳答案

trans 查看完整内容

1.创建一个Scrapy项目 2.定义提取的Item 3.编写爬取网站的 spider 并提取 Item 4.编写 Item Pipeline 来存储提取到的Item(即数据) 举例： import scrapy from myproject.items import MyItem class MySpider(scrapy.Spider): name = 'example.com' allowed_domains = ['example.com'] start_urls = [ 'http://www.example.com/1.html', 'http://www.example.com/2.html', ...

分享0 收藏1 回帖

关键词：python scrapy 编写程序 CRAP CRA Python 爬虫

相关帖子

沙发

trans 发表于 2015-11-7 18:57:23

1.创建一个Scrapy项目
2.定义提取的Item
3.编写爬取网站的 spider 并提取 Item
4.编写 Item Pipeline 来存储提取到的Item(即数据)

举例：
import scrapy
from myproject.items import MyItem

class MySpider(scrapy.Spider):
name = 'example.com'
allowed_domains = ['example.com']
start_urls = [
      'http://www.example.com/1.html',
      'http://www.example.com/2.html',
      'http://www.example.com/3.html',
]

def parse(self, response):
      sel = scrapy.Selector(response)
      for h3 in response.xpath('//h3').extract():
         yield MyItem(title=h3)

      for url in response.xpath('//a/@href').extract():
         yield scrapy.Request(url, callback=self.parse)