国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python實現的大數據分析操作系統日志功能示例

2020-02-16 01:05:29
字體:
來源:轉載
供稿:網友

本文實例講述了Python實現的大數據分析操作系統日志功能。分享給大家供大家參考,具體如下:

一 代碼

1、大文件切分

import osimport os.pathimport timedef FileSplit(sourceFile, targetFolder):  if not os.path.isfile(sourceFile):    print(sourceFile, ' does not exist.')    return  if not os.path.isdir(targetFolder):    os.mkdir(targetFolder)  tempData = []  number = 1000  fileNum = 1  linesRead = 0  with open(sourceFile, 'r') as srcFile:    dataLine = srcFile.readline().strip()    while dataLine:      for i in range(number):        tempData.append(dataLine)        dataLine = srcFile.readline()        if not dataLine:          break      desFile = os.path.join(targetFolder, sourceFile[0:-4] + str(fileNum) + '.txt')      with open(desFile, 'a+') as f:        f.writelines(tempData)      tempData = []      fileNum = fileNum + 1if __name__ == '__main__':  #sourceFile = input('Input the source file to split:')  #targetFolder = input('Input the target folder you want to place the split files:')  sourceFile = 'test.txt'  targetFolder = 'test'  FileSplit(sourceFile, targetFolder)

2、Mapper代碼

import osimport reimport threadingimport timedef Map(sourceFile):  if not os.path.exists(sourceFile):    print(sourceFile, ' does not exist.')    return  pattern = re.compile(r'[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}')  result = {}  with open(sourceFile, 'r') as srcFile:    for dataLine in srcFile:      r = pattern.findall(dataLine)      if r:        t = result.get(r[0], 0)        t += 1        result[r[0]] = t  desFile = sourceFile[0:-4] + '_map.txt'  with open(desFile, 'a+') as fp:    for k, v in result.items():      fp.write(k + ':' + str(v) + '/n')if __name__ == '__main__':  desFolder = 'test'  files = os.listdir(desFolder)  #如果不使用多線程,可以直接這樣寫  '''for f in files:    Map(desFolder + '//' + f)'''  #使用多線程  def Main(i):    Map(desFolder + '//' + files[i])  fileNumber = len(files)  for i in range(fileNumber):    t = threading.Thread(target = Main, args =(i,))    t.start()

3.Reducer代碼

import osdef Reduce(sourceFolder, targetFile):  if not os.path.isdir(sourceFolder):    print(sourceFolder, ' does not exist.')    return  result = {}  #Deal only with the mapped files  allFiles = [sourceFolder+'//'+f for f in os.listdir(sourceFolder) if f.endswith('_map.txt')]  for f in allFiles:    with open(f, 'r') as fp:      for line in fp:        line = line.strip()        if not line:          continue        position = line.index(':')        key = line[0:position]        value = int(line[position + 1:])        result[key] = result.get(key,0) + value  with open(targetFile, 'w') as fp:    for k,v in result.items():      fp.write(k + ':' + str(v) + '/n')if __name__ == '__main__':  Reduce('test', 'test//result.txt')

二 運行結果

依次運行上面3個程序,得到最終結果:

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 交口县| 临江市| 海城市| 比如县| 西藏| 双流县| 孝感市| 扎兰屯市| 黑水县| 达拉特旗| 司法| 泉州市| 依兰县| 鄱阳县| 陈巴尔虎旗| 敖汉旗| 澄城县| 玛多县| 天长市| 清河县| 获嘉县| 农安县| 柳州市| 平安县| 磐石市| 镇安县| 阿瓦提县| 兴安县| 南乐县| 舒兰市| 西藏| 新余市| 宜兴市| 麻阳| 新田县| 英超| 疏勒县| 宁晋县| 延边| 个旧市| 疏勒县|