Python處理Excel文件實(shí)例代碼

2020-02-16 01:43:58

字體：大中小

供稿：網(wǎng)友

因?yàn)楣ぷ餍枨螅枰獙徍艘徊糠謖uery內(nèi)容是否有效，query儲(chǔ)存在Excel中，文本內(nèi)容為頁面的Title，而頁面的URL以HyperLink的格式關(guān)聯(lián)到每個(gè)Cell。

于是本能的想到用Python讀取Excel文件之后進(jìn)行文本分析，之后對(duì)每個(gè)鏈接進(jìn)行一次HttpRequest，通過分析HttpResponse的內(nèi)容來判斷當(dāng)前鏈接是否有效。

于是上網(wǎng)搜了下，發(fā)現(xiàn)比較主流的是用xlrd的插件，但是實(shí)際使用過程中發(fā)現(xiàn)，無論如何，最終獲取的hyperlink_map值一直都是None，也沒空去分析到底是為什么。最后經(jīng)過搜索發(fā)現(xiàn)一個(gè)叫xlwings的Python庫，可以有效使用。

xlwings：Python For Excel

具體的代碼如下：

# -*- coding=utf-8 -*-import xlwings as xwimport urllibimport systype = sys.getfilesystemencoding() def get_html(url):  page = urllib.urlopen(url)  html = page.read()  return unzip(html)## Debug的時(shí)候發(fā)現(xiàn)無論怎樣做Decode，最后的結(jié)果都是亂碼## 后來發(fā)現(xiàn)是因?yàn)閷?duì)應(yīng)的網(wǎng)頁做了壓縮處理，所以需要對(duì)獲取的網(wǎng)頁內(nèi)容手動(dòng)解壓縮def unzip(data):  import gzip  import StringIO  data = StringIO.StringIO(data)  gz = gzip.GzipFile(fileobj=data)  data = gz.read()  gz.close()  return datawb = xw.Book(r"C:/Users/hasee/Desktop/Test.xlsx")main_data = wb.sheets["Sheet2"]## 通過獲取Last Cell來確定當(dāng)前Sheet的有效行數(shù)與列數(shù)rownum = main_data.range('A1').current_region.last_cell.rowcolnum = main_data.range('A1').current_region.last_cell.column## 定位column對(duì)應(yīng)的列col_dict = {"2":"B","3":"C","4":"D","5":"E","6":"F"}for row in range(1, rownum + 1):  for col in range(2, colnum + 1):    query = main_data.range(row, 1).value    cell = main_data.range(row, col)    link = cell.hyperlink    html = get_html(link)    if "error-container" in html:      print "%s,%s,%s,%s" % (query, col_dict.get(str(col))+str(row), cell.value, cell.hyperlink)      ## 對(duì)無效的鏈接所屬的Cell染色，直接寫入文件      cell.color = (253,218,4)

以上就是本文的全部內(nèi)容，希望對(duì)大家的學(xué)習(xí)有所幫助，也希望大家多多支持武林站長站。

上一篇：python實(shí)現(xiàn)各進(jìn)制轉(zhuǎn)換的總結(jié)大全

下一篇：基于hashlib模塊--加密(詳解)