使用Python在多个文件中替换多个字符串

时间:2018-04-08 05:06:01

标签: python regex yaml substitution

我有一个文件列表和需要在yaml文件中捕获的字符串列表。我想编写一个接受这个yaml文件并执行搜索和替换方法的函数。这是我到目前为止所得到的

2个文本文件和yaml文件

txt_1.txt

aB123.Abc
AB345.aBC
ab123.ABC
Ab345.abc

txt_2.txt

ab123.Abc
AB345.ABC
current_date

yaml_file - cf_master.yml

input_files:
    - txt_1.txt
    - txt_2.txt
replacement_strings:
    string1:
        from: AB123.ABC
        to: XY000.XYZ
    string2:
        from: AB345.ABC
        to:   XY001.ZYX
    string3:
        from: current_date
        to: '2018-04-07'

目的是将所有字符串(从值)替换为(到值)忽略大小写(不区分大小写)

import yaml
import re

with open('cf_master.yml') as f:
        dataMap = yaml.safe_load(f)

def string_replacer(dataMap):
    for files in dataMap['input_files']:
            with open(dataMap['input_files']) as f:
                input_h = f.read()
    for string in dataMap['replacement_strings']:
            output_h = input_h.replace(
                                      dataMap['replacement_strings'][string]['from'],
                                      dataMap['replacement_strings'][string]['to']
                                      )
    with open(output_dataMap[input_files],"w") as f:
                f.write(output_h)
    return output_dataMap[input_files]

string_replacer(dataMap)

我不明白如何更正此代码。输入文件,yaml文件和生成的新文件都在同一个文件夹中

1 个答案:

答案 0 :(得分:2)

您可以简化yaml文件。替换字符串不需要索引

input_files:
    - txt_1.txt
    - txt_2.txt
replacement_strings:
    - from: AB123.ABC
      to: XY000.XYZ
    - from: AB345.ABC
      to:   XY001.ZYX
    - from: current_date
      to: '2018-04-07'

就替换而言,您可能希望在两次传递中进行替换,首先用临时标记替换,然后返回并用实际替换替换标记。这可以防止替换者相互交互。例如,您将所有'a'替换为'b'&而'b'替换为'c''秒。如果没有中间标记步骤,第二次替换将替换所有原始'b',以及替换'b'&#39中的所有'a' ; S

import yaml
import re

with open('cf_master.yml') as f:
    data = yaml.safe_load(f)


for filepath in data['input_files']:
    with open(filepath, 'r') as f:
        txt = f.read()

    marker_d = dict()
    for i, d in enumerate(data['replacement_strings']):
        marker = '__$TEMP{}$__'.format(i)
        marker_d[marker] = d['to']
        txt = re.sub(re.escape(d['from']), marker, txt, flags=re.I)

    for marker, s in marker_d.items():
        txt = re.sub(re.escape(marker), s, txt)

    # Save file somewhere?