0x00 前言
突然想做一個漏洞詞云,看看哪些漏洞比較高頻,如果某些廠商有漏洞公開(比如ly),也好針對性挖掘。就選x云吧(鏡像站 http://wy.hxsec.com/bugs.php )。用jieba和wordcloud兩個強大的第三方庫,就可以輕松打造出x云漏洞詞云。
github地址: https://github.com/theLSA/wooyun_wordcloud
本站下載地址:wooyun_wordcloud
0x01 爬取標題
直接上代碼:
#coding:utf-8#Author:LSA#Description:wordcloud for wooyun#Date:20170904import urllibimport urllib2import reimport threadingimport Queueq0 = Queue.Queue()threads = 20threadList = []def gettitle(): while not q0.empty(): i = q0.get() url = 'http://wy.hxsec.com/bugs.php?page=' + str(i) html = urllib.urlopen(url).read() reg = re.compile(r'<li style="width:60%;height:25px;background-color:#FFFFFF;float:left" ><a href=".*?" rel="external nofollow" >(.*?)</a>') titleList = re.findall(reg,html) fwy = open("wooyunBugTitle.txt","a") for title in titleList: fwy.write(title+'/n') fwy.flush() fwy.close() print 'Page ' + str(i) + ' over!'def main(): for page in range(1,2962): q0.put(page) for thread in range(threads): t = threading.Thread(target=gettitle) t.start() threadList.append(t) for th in threadList: th.join() print '***********************All pages over!**********************'if __name__ == '__main__': main()0x02 打造詞云
還是直接上代碼:
# coding: utf-8import jiebafrom wordcloud import WordCloudimport matplotlib.pyplot as pltdata = open("wooyunBugTitle.txt","r").read()cutData = jieba.cut(data, cut_all=True)word = " ".join(cutData)cloud = WordCloud( #設置字體,不指定可能會出現(xiàn)中文亂碼 font_path="msyh.ttf", #font_path=path.join(e,'xxx.ttc'), #設置背景色 background_color='white', #詞云形狀 #mask=color_mask, #允許最大詞匯 max_words=2000, #最大號字體 max_font_size=40 )wc = cloud.generate(word)wc.to_file("wooyunwordcloud.jpg") plt.imshow(wc)plt.axis("off")plt.show()0x03 效果演示:



0x04 結語
由詞云圖可以看出,SQL注入依舊風光無限,其次是命令執(zhí)行,繼而是信息泄漏,整體看還是比較直觀的。
新聞熱點
疑難解答