将多个html文件的RegEx结果写入.txt outfile

时间:2017-02-09 21:41:39

标签: regex python-3.x

我在编写从多个html文件(非英文文本)到.txt outfile的RegEx结果时遇到问题。它在屏幕上的新行上将它们打印为多个字符串,但是当我尝试将其写入outfile时,它只会写一个随机字符串。我的代码看起来像这样: 你能帮忙我怎么把所有大约100个文件中的所有字符串都写到outfile中?

from bs4 import BeautifulSoup
import sys
import string
import re
import os

text = glob.glob('C:/Users/dell/Desktop/python-for-text-analysis-master/Notebooks/MEK/*')   
for filename in text:
    with open(filename, encoding='ISO-8859-1', errors="ignore") as f:
        mytext = f.read()

soup = BeautifulSoup(mytext, "lxml")
extracted_text = soup.getText()

pattern = r"\ba\b\s\bleg[\w]+bb\b\s\b[\w]+\b"
result = (", ".join(re.findall(pattern, mytext)))

file = "C:/Users/dell/Desktop/python-for-text-analysis-master/Data/Charlie/charlie_neww.txt"
for row in result:
    with open (file, "w", encoding="iso-8859-1", errors="ignore") as outfile:
        print(result, end='\n', file=outfile)

1 个答案:

答案 0 :(得分:0)

  

with open (file, "w", ...

“w”模式截断文件(即每次打开文件时,文件都被清除)。考虑“追加”的模式“a”。