Python3實現的爬蟲爬取數據并存入mysql數據庫操作示例

2020-02-15 21:40:25

字體：大中小

來源：轉載

供稿：網友

本文實例講述了Python3實現的爬蟲爬取數據并存入mysql數據庫操作。分享給大家供大家參考，具體如下：

爬一個電腦客戶端的訂單。羅總推薦，抓包工具用的是HttpAnalyzerStdV7，與chrome自帶的F12類似。客戶端有接單大廳，羅列所有訂單的簡要信息。當單子被接了，就不存在了。我要做的是新出訂單就爬取記錄到我的數據庫zyc里。

設置每10s爬一次。

抓包工具頁面如圖：

首先是爬蟲，先找到數據存儲的頁面，再用正則爬出。

# -*- coding:utf-8 -*-import reimport requestsimport pymysql #Python3的mysql模塊，Python2 是mysqldbimport datetimeimport timedef GetResults():  requests.adapters.DEFAULT_RETRIES = 5 #有時候報錯，我在網上找的不知道啥意思，好像也沒用。  reg = [r'"id":(.*?),',      r'"order_no":"(.*?)",',      r'"order_title":"(.*?)",',      r'"publish_desc":"(.*?)",',      r'"game_area":"(.*?)///(.*?)///(.*?)",',      r'"order_current":"(.*?)",',      r'"order_content":"(.*?)",',      r'"order_hours":(.*?),',      r'"order_price":"(.*?)",',      r'"add_price":"(.*?)",',      r'"safe_money":"(.*?)",',      r'"speed_money":"(.*?)",',      r'"order_status_desc":"(.*?)",',      r'"order_lock_desc":"(.*?)",',      r'"cancel_type_desc":"(.*?)",',      r'"kf_status_desc":"(.*?)",',      r'"is_show_pwd":(.*?),',      r'"game_pwd":"(.*?)",',      r'"game_account":"(.*?)",',      r'"game_actor":"(.*?)",',      r'"left_hours":"(.*?)",',      r'"created_at":"(.*?)",',      r'"account_id":"(.*?)",',      r'"mobile":"(.*?)",',      r'"contact":"(.*?)",',      r'"qq":"(.*?)"},']  results=[]  try:    for l in range(1,2):   #頁碼      proxy = {'HTTP':'61.135.155.82:443'} #代理ip      html = requests.get('https://www.dianjingbaozi.com/api/dailian/soldier/hall?access_token=3ef3abbea1f6cf16b2420eb962cf1c9a&dan_end=&dan_start=&game_id=2&kw=&order=price_desc&page=%d'%l+'&pagesize=30&price_end=0&price_start=0&server_code=000200000000&sign=ca19072ea0acb55a2ed2486d6ff6c5256c7a0773×tamp=1511235791&type=public&type_id=%20HTTP/1.1',proxies=proxy) # 用get的方式訪問。網頁解碼成中文。接單大廳頁。      #      html=html.content.decode('utf-8')      outcome_reg_order_no = re.findall(r'"order_no":"(.*?)","game_area"', html)  #獲取訂單編號，因為訂單詳情頁url與訂單編號有關。      for j in range(len(outcome_reg_order_no)):        html_order = requests.get('http://www.lpergame.com/api/dailian/order/detail?access_token=eb547a14bad97e1ee5d835b32cb83ff1&order_no=' +outcome_reg_order_no[j] + '&sign=c9b503c0e4e8786c2945dc0dca0fabfa1ca4a870×tamp=1511146154 HTTP/1.1',proxies=proxy)  #訂單詳細頁        html_order=html_order.content.decode('utf-8')        # print(html_order)        outcome_reg = []        for i in range(len(reg)):#每條訂單          outcome = re.findall(reg[i], html_order)          if i == 4:            for k in range(len(outcome)):              outcome_reg.extend(outcome[k])          else:            outcome_reg.extend(outcome)        results.append(outcome_reg) #結果集    return results  except:    time.sleep(5)  #有時太頻繁會報錯。    print("失敗")    pass

上一篇：Python查看微信撤回消息代碼

下一篇：淺談python中對于json寫入txt文件的編碼問題