国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python自定義scrapy中間模塊避免重復采集的方法

2020-02-23 00:34:18
字體:
來源:轉載
供稿:網友

本文實例講述了Python自定義scrapy中間模塊避免重復采集的方法。分享給大家供大家參考。具體如下:

from scrapy import logfrom scrapy.http import Requestfrom scrapy.item import BaseItemfrom scrapy.utils.request import request_fingerprintfrom myproject.items import MyItemclass IgnoreVisitedItems(object):  """Middleware to ignore re-visiting item pages if they  were already visited before.   The requests to be filtered by have a meta['filter_visited']  flag enabled and optionally define an id to use   for identifying them, which defaults the request fingerprint,  although you'd want to use the item id,  if you already have it beforehand to make it more robust.  """  FILTER_VISITED = 'filter_visited'  VISITED_ID = 'visited_id'  CONTEXT_KEY = 'visited_ids'  def process_spider_output(self, response, result, spider):    context = getattr(spider, 'context', {})    visited_ids = context.setdefault(self.CONTEXT_KEY, {})    ret = []    for x in result:      visited = False      if isinstance(x, Request):        if self.FILTER_VISITED in x.meta:          visit_id = self._visited_id(x)          if visit_id in visited_ids:            log.msg("Ignoring already visited: %s" % x.url,                level=log.INFO, spider=spider)            visited = True      elif isinstance(x, BaseItem):        visit_id = self._visited_id(response.request)        if visit_id:          visited_ids[visit_id] = True          x['visit_id'] = visit_id          x['visit_status'] = 'new'      if visited:        ret.append(MyItem(visit_id=visit_id, visit_status='old'))      else:        ret.append(x)    return ret  def _visited_id(self, request):    return request.meta.get(self.VISITED_ID) or request_fingerprint(request)

希望本文所述對大家的Python程序設計有所幫助。

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 宜春市| 尼玛县| 大姚县| 深水埗区| 通海县| 新田县| 石嘴山市| 丽水市| 秦安县| 肇东市| 临夏县| 大关县| 嘉祥县| 重庆市| 松桃| 伊吾县| 璧山县| 沈丘县| 遵义县| 荆州市| 永宁县| 乐清市| 宣威市| 图木舒克市| 阿荣旗| 徐闻县| 天柱县| 土默特右旗| 莱西市| 安宁市| 伊通| 遂溪县| 齐齐哈尔市| 东港市| 克拉玛依市| 乐山市| 锡林浩特市| 泸州市| 盘山县| 金湖县| 武冈市|