Python爬蟲實現獲取動態gif格式搞笑圖片的方法示例

2020-02-16 00:16:50

字體：大中小

來源：轉載

供稿：網友

本文實例講述了Python爬蟲實現獲取動態gif格式搞笑圖片的方法。分享給大家供大家參考，具體如下：

有時候看到一些喜歡的動圖，如果一個個取保存挺麻煩，有的網站還不支持右鍵保存，因此使用python來獲取動態圖，就看看就很有意思了

本次爬取的網站是居然搞笑網 http://www.zbjuran.com/dongtai/list_4_1.html

思路：

獲取當前頁面內容

查找頁面中動圖所代表的url地址

保存這個地址內容到本地

如果想爬取多頁，就可以加上一個循環條件

代碼：

#!/usr/bin/python#coding:utf-8import urllib2,time,uuid,urllib,os,sys,refrom bs4 import BeautifulSoupreload(sys)sys.setdefaultencoding('utf-8')#獲取頁面內容def getHtml(url):    try:        print url        html = urllib2.urlopen(url).read()#.decode('utf-8')#解碼為utf-8    except:        return    return html#獲取動圖所代表的url列表def getImagUrl(html):    if not html:        print 'nothing can be found'        return    ImagUrlList=[]    soup=BeautifulSoup(html,'lxml')    #獲取item列表    items=soup.find("div",{"class":"main"}).find_all('div',{'class':'item'})    for item in items:        target={}        #通過if語句，過濾廣告項        if item.find('div',{"class":"text"}):            #獲取url            imgurl=item.find('div',{"class":"text"}).find('img').get('src')            target['url']=imgurl            #獲取名字            target['name']=item.find('h3').text            ImagUrlList.append(target)    return ImagUrlList#下載圖片到本地def download(author,imgurl,typename,pageNo):    #定義文件夾的名字    x = time.localtime(time.time())    foldername = str(x.__getattribute__("tm_year"))+"-"+str(x.__getattribute__("tm_mon"))+"-"+str(x.__getattribute__("tm_mday"))    download_img=None    picpath = 'Jimy/%s/%s/%s' % (foldername,typename,str(pageNo))    filename = author+str(uuid.uuid1())    pic_type=imgurl[-3:]    if not os.path.exists(picpath):        os.makedirs(picpath)    target = picpath+"/%s.%s" % (filename,pic_type)    print "動圖存貯位置:"+target    download_img = urllib.urlretrieve(imgurl, target)#將圖片下載到指定路徑中    print "圖片出處為："+imgurl    return download_img#退出函數def myquit():    print "Bye Bye!"    exit(0)def start(pageNo):    targeturl="http://www.zbjuran.com/dongtai/list_4_%s.html" % str(pageNo)    html = getHtml(targeturl)    urllist=getImagUrl(html)    for imgurl in urllist:        download(imgurl['name'],imgurl['url'],'搞笑動圖',pageNo)if __name__ == '__main__':    print '''            *****************************************            **  Welcome to Spider of GIF     **            **   Created on 2017-3-16      **            **   @author: Jimy         **            *****************************************'''    pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit/n/請輸入要爬取的頁面，范圍為（1-100），如果退出，請輸入Q>/n>")    while not pageNo.isdigit() or int(pageNo) > 50 or int(pageNo) < 1:        if pageNo == 'Q':            myquit()        print "Param is invalid , please try again."        pageNo = raw_input("Input the page number you want to scratch >")    print pageNo    start(pageNo)    #第一次爬取結束    pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit/n/請輸入總共需要爬取的頁面，范圍為（1-5000），如果退出，請輸入Q>/n>")    while not pageNo.isdigit() or int(pageNo) > 5000 or int(pageNo) < 1:        if pageNo == 'Q':            myquit()        print "Param is invalid , please try again."        pageNo = raw_input("Input the page number you want to scratch >")    #循環遍歷，爬取多頁    for num in xrange(int(pageNo)):        start(str(num+1))

上一篇：Python實現將通信達.day文件讀取為DataFrame

下一篇：Python數據抓取爬蟲代理防封IP方法