国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

基于scrapy實(shí)現(xiàn)的簡單蜘蛛采集程序

2020-02-23 00:46:48
字體:
供稿:網(wǎng)友

本文實(shí)例講述了基于scrapy實(shí)現(xiàn)的簡單蜘蛛采集程序。分享給大家供大家參考。具體如下:

# Standard Python library imports# 3rd party importsfrom scrapy.contrib.spiders import CrawlSpider, Rulefrom scrapy.contrib.linkextractors.sgml import SgmlLinkExtractorfrom scrapy.selector import HtmlXPathSelector# My importsfrom poetry_analysis.items import PoetryAnalysisItemHTML_FILE_NAME = r'.+/.html'class PoetryParser(object):  """  Provides common parsing method for poems formatted this one specific way.  """  date_pattern = r'(/d{2} /w{3,9} /d{4})'   def parse_poem(self, response):    hxs = HtmlXPathSelector(response)    item = PoetryAnalysisItem()    # All poetry text is in pre tags    text = hxs.select('//pre/text()').extract()    item['text'] = ''.join(text)    item['url'] = response.url    # head/title contains title - a poem by author    title_text = hxs.select('//head/title/text()').extract()[0]    item['title'], item['author'] = title_text.split(' - ')    item['author'] = item['author'].replace('a poem by', '')    for key in ['title', 'author']:      item[key] = item[key].strip()    item['date'] = hxs.select("http://p[@class='small']/text()").re(date_pattern)    return itemclass PoetrySpider(CrawlSpider, PoetryParser):  name = 'example.com_poetry'  allowed_domains = ['www.example.com']  root_path = 'someuser/poetry/'  start_urls = ['http://www.example.com/someuser/poetry/recent/',         'http://www.example.com/someuser/poetry/less_recent/']  rules = [Rule(SgmlLinkExtractor(allow=[start_urls[0] + HTML_FILE_NAME]),                  callback='parse_poem'),       Rule(SgmlLinkExtractor(allow=[start_urls[1] + HTML_FILE_NAME]),                  callback='parse_poem')]

希望本文所述對大家的Python程序設(shè)計(jì)有所幫助。

發(fā)表評論 共有條評論
用戶名: 密碼:
驗(yàn)證碼: 匿名發(fā)表
主站蜘蛛池模板: 黔东| 连平县| 淮阳县| 阿拉善左旗| 丹东市| 文水县| 平顺县| 公安县| 辛集市| 双城市| 寿阳县| 五家渠市| 将乐县| 峡江县| 阿城市| 科尔| 板桥市| 石景山区| 渭源县| 武汉市| 双桥区| 凉城县| 长沙县| 马尔康县| 定远县| 永新县| 广丰县| 峡江县| 政和县| 福州市| 都安| 山阳县| 贺兰县| 泽库县| 宁德市| 赤水市| 上思县| 中山市| 汤阴县| 平陆县| 女性|