国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 學(xué)院 > 開發(fā)設(shè)計 > 正文

記錄一次爬蟲經(jīng)歷

2019-11-11 03:03:43
字體:
供稿:網(wǎng)友

初學(xué)python,先記錄一次爬蟲經(jīng)歷,就作為python的入門訓(xùn)練吧。目標(biāo)網(wǎng)站采用了動態(tài)加載技術(shù)。

#-*- coding:utf-8 -*-import requestsimport reimport threadingglobal headers_for_pc,headers_for_realurl,offset_for_pc,forbiddenoffset_for_pc=0forbidden=["xxxxxxx","xxxxxxx","xxxxxx","xxxxxxx"]headers_for_pc={'Accept':'*/*','Accept-Encoding':'gzip, deflate, sdch','Accept-Language':'zh-CN,zh;q=0.8','Cookie':'xxxxx''Host':'aps.115.com','Referer':'http://aps.115.com/bridge_2.0.html?xxxxx','User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36','X-Requested-With':'xmlHttxmlhttpRequest'}url_for_pc="http://aps.115.com/natsort/files.php?xxxxxx"url_for_realurl="http://web.api.115.com/xxxxxx"def getpc(url,offset):    response=requests.get(url,params="offset=%s"%(offset),headers=headers_for_pc)    if response.status_code==200:        #print response.url        html=response.text        pickcodes=re.findall(r'"pc":"(.*?)"',html)        return pickcodes    else:        print "Sory,Get Pickcodes Fail,ErrorCode:",reponse.status_code        return -1def geturl(url,pickcode):    #print pickcode    response=requests.get(url,params="pickcode="+pickcode,headers=headers_for_realurl)    #print response.url    if response.status_code==200:        html=response.text        #print html        realurl=re.findall(r'"file_url":"(.*?)"',html)        #name=str(re.findall(r'"file_name":"(.*?)"',html)[0])        return realurl    else:        print "Sory,Get Realurl Fail,Errorcode",response.status_code        return -1def getpic(url,name):    #print "name=",name    f=open("%s"%(name),"wb")    f.write(requests.get(url).content)    f.close()    #print name,"-->done"def work(offset):    offset="%s"%(offset)    print offset    pcs=getpc(url_for_pc,offset)    if pcs!=-1:        for pc in pcs:            if pc not in forbidden:                #print pc                url=geturl(url_for_realurl,pc)                getpic(str(url[0]).replace("//",""),pc)for i in range(0,197,24):
    td=threading.Thread(target=work,args=(i))    td.run()print "done"


發(fā)表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發(fā)表
主站蜘蛛池模板: 青岛市| 吉林省| 阳原县| 蓝田县| 大丰市| 汽车| 石阡县| 大安市| 蒙城县| 富源县| 海宁市| 济宁市| 八宿县| 五台县| 罗源县| 定边县| 定远县| 淳安县| 巴林左旗| 长顺县| 芜湖县| 东港市| 祁东县| 游戏| 铜川市| 小金县| 突泉县| 赤城县| 赤水市| 栾川县| 新乡市| 台东县| 沂源县| 淮北市| 宝坻区| 体育| 亚东县| 临城县| 凤翔县| 白玉县| 金溪县|