国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Pyspider中給爬蟲偽造隨機請求頭的實例

2020-02-23 00:00:32
字體:
來源:轉載
供稿:網友

Pyspider 中采用了 tornado 庫來做 http 請求,在請求過程中可以添加各種參數,例如請求鏈接超時時間,請求傳輸數據超時時間,請求頭等等,但是根據pyspider的原始框架,給爬蟲添加參數只能通過 crawl_config這個Python字典來完成(如下所示),框架代碼將這個字典中的參數轉換成 task 數據,進行http請求。這個參數的缺點是不方便給每一次請求做隨機請求頭。

crawl_config = {"user_agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36","timeout": 120,"connect_timeout": 60,"retries": 5,"fetch_type": 'js',"auto_recrawl": True,}

這里寫出給爬蟲添加隨機請求頭的方法:

1、編寫腳本,將腳本放置在 pyspider 的 libs 文件夾下,命名為 header_switch.py

#!/usr/bin/env python# -*- coding:utf-8 -*-# Created on 2017-10-18 11:52:26import randomimport timeclass HeadersSelector(object):  """  Header 中缺少幾個字段 Host 和 Cookie  """  headers_1 = {    "Proxy-Connection": "keep-alive",    "Pragma": "no-cache",    "Cache-Control": "no-cache",    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",    "DNT": "1",    "Accept-Encoding": "gzip, deflate, sdch",    "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.4",    "Referer": "https://www.baidu.com/s?wd=%BC%96%E7%A0%81&rsv_spt=1&rsv_iqid=0x9fcbc99a0000b5d7&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=0&oq=If-None-Match&inputT=7282&rsv_t",    "Accept-Charset": "gb2312,gbk;q=0.7,utf-8;q=0.7,*;q=0.7",  } # 網上找的瀏覽器  headers_2 = {    "Proxy-Connection": "keep-alive",    "Pragma": "no-cache",    "Cache-Control": "no-cache",    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0",    "Accept": "image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/*",    "DNT": "1",    "Referer": "https://www.baidu.com/link?url=c-FMHf06-ZPhoRM4tWduhraKXhnSm_RzjXZ-ZTFnPAvZN",    "Accept-Encoding": "gzip, deflate, sdch",    "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.4",  } # window 7 系統瀏覽器  headers_3 = {    "Proxy-Connection": "keep-alive",    "Pragma": "no-cache",    "Cache-Control": "no-cache",    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0",    "Accept": "image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/*",    "DNT": "1",    "Referer": "https://www.baidu.com/s?wd=http%B4%20Pragma&rsf=1&rsp=4&f=1&oq=Pragma&tn=baiduhome_pg&ie=utf-8&usm=3&rsv_idx=2&rsv_pq=e9bd5e5000010",    "Accept-Encoding": "gzip, deflate, sdch",    "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.7,en;q=0.6",  } # Linux 系統 firefox 瀏覽器  headers_4 = {    "Proxy-Connection": "keep-alive",    "Pragma": "no-cache",    "Cache-Control": "no-cache",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0",    "Accept": "*/*",    "DNT": "1",    "Referer": "https://www.baidu.com/link?url=c-FMHf06-ZPhoRM4tWduhraKXhnSm_RzjXZ-ZTFnP",    "Accept-Encoding": "gzip, deflate, sdch",    "Accept-Language": "zh-CN,zh;q=0.9,en-US;q=0.7,en;q=0.6",  } # Win10 系統 firefox 瀏覽器  headers_5 = {    "Connection": "keep-alive",    "Pragma": "no-cache",    "Cache-Control": "no-cache",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64;) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063",    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",    "Referer": "https://www.baidu.com/link?url=c-FMHf06-ZPhoRM4tWduhraKXhnSm_RzjXZ-",    "Accept-Encoding": "gzip, deflate, sdch",    "Accept-Language": "zh-CN,zh;q=0.9,en-US;q=0.7,en;q=0.6",    "Accept-Charset": "gb2312,gbk;q=0.7,utf-8;q=0.7,*;q=0.7",  } # Win10 系統 Chrome 瀏覽器  headers_6 = {    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",    "Accept-Encoding": "gzip, deflate, sdch",    "Accept-Language": "zh-CN,zh;q=0.8",    "Pragma": "no-cache",    "Cache-Control": "no-cache",    "Connection": "keep-alive",    "DNT": "1",    "Referer": "https://www.baidu.com/s?wd=If-None-Match&rsv_spt=1&rsv_iqid=0x9fcbc99a0000b5d7&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rq",    "Accept-Charset": "gb2312,gbk;q=0.7,utf-8;q=0.7,*;q=0.7",    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0",  } # win10 系統瀏覽器  def __init__(self):    pass  def select_header(self):    n = random.randint(1, 6)    switch={    1: self.headers_1    2: self.headers_2    3: self.headers_3    4: self.headers_4    5: self.headers_5    6: self.headers_6    }    headers = switch[n]    return headers            
發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 泽州县| 贺兰县| 湖口县| 南江县| 上林县| 温宿县| 隆尧县| 辛集市| 嵩明县| 丹东市| 格尔木市| 阳江市| 达拉特旗| 且末县| 黄骅市| 嵊州市| 祁门县| 静安区| 灵台县| 株洲县| 天台县| 大埔区| 鹰潭市| 神木县| 海阳市| 城口县| 札达县| 读书| 龙泉市| 肇源县| 沙洋县| 巩留县| 尼勒克县| 祁阳县| 南平市| 高青县| 湖北省| 紫云| 虹口区| 海宁市| 常宁市|