将RegEx写入txt文件

时间:2016-09-14 18:49:40

标签: regex python-3.x typeerror

我正在使用以下代码使用RegEx将输出打印到txt文件。但是我总是收到此错误消息:

 File "C:\lib\re.py", line 213, in findall
return _compile(pattern, flags).findall(string)

TypeError:期望的字符串或类似字节的对象

import glob
import os
import re


def extractor():
    os.chdir(r"F:\Test")
    for file in glob.iglob("*.html"):  # iterates over all files in the directory ending in .html
        with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
            contents = f.read()
            extract = re.compile(r'RegEx', re.I | re.S)
            if re.findall(extract, contents) is not None:
                for x in re.findall(extract, contents):
                    out.write(x)
            out.close()
extractor()

任何人都知道导致此错误的原因是什么?显然它与类型错误有关?

1 个答案:

答案 0 :(得分:0)

稍作调整:

import glob
import os
import re


def extractor():
    # you only need it once, dont' you?
    extract = re.compile(r'RegEx', re.I | re.S)
    os.chdir(r"F:\Test")
    for file in glob.iglob("*.html"):  # iterates over all files in the directory ending in .html
    with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
        contents = f.read()
        for match in extract.findall(contents):
            out.write(match)
        out.close()

extractor()

这使用extract作为对象,甚至不需要在循环中进行if not None检查。
如果它仍然不起作用,请详细说明你的实际正则表达式(它有几个小组等吗?)。

相关问题