【python】使用中科院NLPIR分詞工具進行mysql數據分詞

2019-11-08 01:39:45

字體：大中小

來源：轉載

供稿：網友

本文主要是使用中科院的分詞工具對于數據庫中的數據文本進行分詞在電腦上安裝python，并導入python與數據庫的連接插件MySQLdb 以及中科院的分詞工具NLPIR

import pynlpirimport codecsimport math,MySQLdbfrom search import *pynlpir.open()#連接數據庫conn=MySQLdb.connect(host="127.0.0.1",user="root",passwd="123456",db="",charset="utf8") cursor = conn.cursor() n = cursor.execute("select * from test where id = 8 ")

停用詞 st = codecs.open('E://testWord//stopwords.txt', 'rb',encoding='gbk') 讀取數據庫中的數據

for row in cursor.fetchall(): s=row[3] singletext_result = [] #item中第一列存儲的是關鍵詞，第二列是詞性 PRint row[0] for item in pynlpir.segment(s): #print item[0] singletext_result.append(item[0]) #print singletext_result #讀取停用詞 for line in st: line = line.strip() stopwords.append(line) print stopwords

過濾停用詞

#過濾停用詞 localtion = 0 for word in singletext_result: localtion = localtion + 1 if word not in stopwords: if word >= u'/u4e00' and word <= u'/u9fa5':#判斷是否是漢字 delstopwords_singletxt.append(word)

構建詞表

#構建詞表 for item in delstopwords_singletxt: if(search(item)): if(savecount(item)): print 'success to add count' else: if(save(item)): print 'success to add keyword'

上一篇：python小知識

下一篇：python調用其他py文件的函數和類