^{<span id="deht1"><tt id="deht1"></tt></span>}

在python中使用正則表達式查找可嵌套字符串組

2020-02-16 10:28:30

字體：大中小

來源：轉載

供稿：網友

在網上看到一個小需求，需要用正則表達式來處理。原需求如下：

找出文本中包含”因為……所以”的句子，并以兩個詞為中心對齊輸出前后3個字，中間全輸出，如果“因為”和“所以”中間還存在“因為”“所以”，也要找出來，另算一行，輸出格式為：

行號前面3個字 *因為* 全部 &所以& 后面3個字(標點符號算一個字)

2 還不是 *因為* 這里好， &所以& 沒有人

實現方法如下：

#encoding:utf-8import osimport redef getPairStriList(filename):  pairStrList = []  textFile = open(filename, 'r')  pattern = re.compile(u'.{3}/u56e0/u4e3a.*/u6240/u4ee5.{3}') #u'/u56e0/u4e3a和u'/u6240/u4ee5'分別為“因為”和“所以”的utf8碼  for line in textFile:    utfLine = line.decode('utf8')    result = pattern.search(utfLine)    while result:      resultStr = result.group()      pairStrList.append(resultStr)      result = pattern.search(resultStr,2,len(resultStr)-2)  #對每個字符串進行格式轉換和拼接    for i in range(len(pairStrList)):    pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'/u56e0/u4e3a',u' */u56e0/u4e3a* ',1) + pairStrList[i][5:]    pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'/u6240/u4ee5',u' &/u6240/u4ee5& ',1)    pairStrList[i] = str(i+1) + ' ' + pairStrList[i]  return pairStrList  if __name__ == '__main__':  pairStrList = getPairStriList('test.txt')  for str in pairStrList:    print str

PS：下面看下python里使用正則表達式的組嵌套

由于組本身是一個完整的正則表達式，所以可以將組嵌套在其他組中，以構建更復雜的表達式。下面的例子，就是進行組嵌套的例子：

#python 3.6 #蔡軍生  #http://blog.csdn.net/caimouse/article/details/51749579 # import re def test_patterns(text, patterns):   """Given source text and a list of patterns, look for   matches for each pattern within the text and print   them to stdout.   """   # Look for each pattern in the text and print the results   for pattern, desc in patterns:     print('{!r} ({})/n'.format(pattern, desc))     print(' {!r}'.format(text))     for match in re.finditer(pattern, text):       s = match.start()       e = match.end()       prefix = ' ' * (s)       print(         ' {}{!r}{} '.format(prefix,                    text[s:e],                    ' ' * (len(text) - e)),         end=' ',       )       print(match.groups())       if match.groupdict():         print('{}{}'.format(           ' ' * (len(text) - s),           match.groupdict()),         )     print()   return

例子：

#python 3.6 #蔡軍生  #http://blog.csdn.net/caimouse/article/details/51749579 # from re_test_patterns_groups import test_patterns test_patterns(   'abbaabbba',   [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], )

上一篇：Python探索之創建二叉樹

下一篇：python編程羊車門問題代碼示例