国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python統計純文本文件中英文單詞出現個數的方法總結【測試可用

2020-02-15 22:31:29
字體:
來源:轉載
供稿:網友

本文實例講述了Python統計純文本文件中英文單詞出現個數的方法。分享給大家供大家參考,具體如下:

第一版: 效率低

# -*- coding:utf-8 -*-#!python3path = 'test.txt'with open(path,encoding='utf-8',newline='') as f:  word = []  words_dict= {}  for letter in f.read():    if letter.isalnum():      word.append(letter)    elif letter.isspace(): #空白字符 空格 /t /n      if word:        word = ''.join(word).lower() #轉小寫        if word not in words_dict:          words_dict[word] = 1        else:          words_dict[word] += 1        word = []#處理最后一個單詞if word:  word = ''.join(word).lower() # 轉小寫  if word not in words_dict:    words_dict[word] = 1  else:    words_dict[word] += 1  word = []for k,v in words_dict.items():  print(k,v)

運行結果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺點:遇到大文件要一次讀入內存,性能不好

# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path,'r',encoding='utf-8') as f:  data = f.read()  word_reg = re.compile(r'/w+')  #word_reg = re.compile(r'/w+/b')  word_list = word_reg.findall(data)  word_list = [word.lower() for word in word_list] #轉小寫  word_set = set(word_list) #避免重復查詢  # words_dict = {}  # for word in word_set:  #   words_dict[word] = word_list.count(word)  # 簡潔寫法  words_dict = {word: word_list.count(word) for word in word_set}  for k,v in words_dict.items():    print(k,v)

運行結果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 阳信县| 北宁市| 永吉县| 潞西市| 榕江县| 迁西县| 白朗县| 井冈山市| 平阳县| 高阳县| 贵阳市| 兴和县| 新田县| 福泉市| 仁布县| 海丰县| 喀什市| 镇坪县| 高雄市| 天峻县| 佛冈县| 通州市| 闵行区| 宝坻区| 佛冈县| 台东县| 呼和浩特市| 工布江达县| 平昌县| 沈阳市| 红安县| 博乐市| 拜城县| 东丽区| 台北县| 依兰县| 井陉县| 盱眙县| 蕉岭县| 漾濞| 侯马市|