Question

好的，我有一个交易文件：

IN CU
     Customer_ID=
     Last_Name=Johnston
     First_Name=Karen
     Street_Address=291 Stone Cr
     City=Toronto
//
IN VE
     License_Plate#=LSR976
     Make=Cadillac
     Model=Seville
     Year=1996
     Owner_ID=779
//
IN SE
     Vehicle_ID=LSR976
     Service_Code=461
     Date_Scheduled=00/12/19

IN表示插入，而CU（表示客户）指的是我们正在编写的文件，在本例中为customer.diff。我遇到的问题是我需要遍历每一行，并检查每个字段的值（Customer_ID）。您看到Customer_ID如何留空？我需要将值0替换为任何数字空白字段，例如Customer_ID=0。这是我到目前为止所做的一切，但没有任何改变：

def insertion():
    field_names = {'Customer_ID=': 'Customer_ID=0',
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'}

    with open('xactions.two.txt', 'r') as from_file:
        search_lines = from_file.readlines()


    if search_lines[3:5] == 'CU':
        for i in search_lines:
            if field_names[i] == True:
                with open('customer.diff', 'w') as to_file:
                    to_file.write(field_names[i])

由于

Answer 1

为什么不尝试一些更简单的东西？我还没有测试过这段代码。

def insertion():
    field_names = {'Customer_ID=': 'Customer_ID=0',
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'}

with open('xactions.two.txt', 'r') as from_file:
    with open('customer.diff', 'w') as to_file:
        for line in from_file:
            line = line.rstrip("\n")
            found = False
            for field in field_names.keys():
                if field in line:
                   to_file.write(line + "0")
                   found = True
            if not found:
                to_file.write(line)
            to_file.write("\n")

Answer 2

这是一个相当全面的方法;它有点长，但没有它看起来那么复杂！

我假设Python 3.x，虽然它应该在Python 2.x中工作，几乎没有变化。我广泛使用生成器来传输数据，而不是将其保存在内存中。

首先：我们将为每个字段定义预期的数据类型。某些字段与内置Python数据类型不对应，因此我首先为这些字段定义一些自定义数据类型：

import time

class Date:
    def __init__(self, s):
        """
        Parse a date provided as "yy/mm/dd"
        """
        if s.strip():
            self.date = time.strptime(s, "%y/%m/%d")
        else:
            self.date = time.gmtime(0.)

    def __str__(self):
        """
        Return a date as "yy/mm/dd"
        """
        return time.strftime("%y/%m/%d", self.date)

def Int(s):
    """
    Parse a string to integer ("" => 0)
    """
    if s.strip():
        return int(s)
    else:
        return 0

class Year:
    def __init__(self, s):
        """
        Parse a year provided as "yyyy"
        """
        if s.strip():
            self.date = time.strptime(s, "%Y")
        else:
            self.date = time.gmtime(0.)

    def __str__(self):
        """
        Return a year as "yyyy"
        """
        return time.strftime("%Y", self.date)

现在我们设置一个表，定义每个字段应该是什么类型：

# Expected data-type of each field:
#   data_types[section][field] = type
data_types = {
    "CU": {
        "Customer_ID":    Int,
        "Last_Name":      str,
        "First_Name":     str,
        "Street_Address": str,
        "City":           str
    },
    "VE": {
        "License_Plate#": str,
        "Make":           str,
        "Model":          str,
        "Year":           Year,
        "Owner_ID":       Int
    },
    "SE": {
        "Vehicle_ID":     str,
        "Service_Code":   Int,
        "Date_Scheduled": Date
    }
}

我们解析输入文件;这是迄今为止最复杂的一点！它是一个有限状态机，实现为生成器函数，一次产生一个部分：

# Customized error-handling
class TransactionError         (BaseException): pass
class EntryNotInSectionError   (TransactionError): pass
class MalformedLineError       (TransactionError): pass
class SectionNotTerminatedError(TransactionError): pass
class UnknownFieldError        (TransactionError): pass
class UnknownSectionError      (TransactionError): pass

def read_transactions(fname):
    """
    Read a transaction file
    Return a series of ("section", {"key": "value"})
    """
    section, accum = None, {}
    with open(fname) as inf:
        for line_no, line in enumerate(inf, 1):
            line = line.strip()

            if not line:
                # blank line - skip it
                pass
            elif line == "//":
                # end of section - return any accumulated data
                if accum:
                    yield (section, accum)
                section, accum = None, {}
            elif line[:3] == "IN ":
                # start of section
                if accum:
                    raise SectionNotTerminatedError(
                       "Line {}: Preceding {} section was not terminated"
                       .format(line_no, section)
                    )
                else:
                    section = line[3:].strip()
                    if section not in data_types:
                        raise UnknownSectionError(
                            "Line {}: Unknown section type {}"
                            .format(line_no, section)
                        )
            else:
                # data entry: "key=value"
                if section is None:
                    raise EntryNotInSectionError(
                        "Line {}: '{}' should be in a section"
                        .format(line_no, line)
                    )
                pair = line.split("=")
                if len(pair) != 2:
                    raise MalformedLineError(
                        "Line {}: '{}' could not be parsed as a key/value pair"
                        .format(line_no, line)
                    )
                key,val = pair
                if key not in data_types[section]:
                    raise UnknownFieldError(
                        "Line {}: unrecognized field name {} in section {}"
                        .format(line_no, key, section)
                    )
                accum[key] = val.strip()

        # end of file - nothing should be left over
        if accum:
            raise SectionNotTerminatedError(
               "End of file: Preceding {} section was not terminated"
               .format(line_no, section)
            )

现在读取文件，其余部分更容易。我们使用上面定义的查找表对每个字段进行类型转换：

def format_field(section, key, value):
    """
    Cast a field value to the appropriate data type
    """
    return data_types[section][key](value)

def format_section(section, accum):
    """
    Cast all values in a section to the appropriate data types
    """
    return (section, {key:format_field(section, key, value) for key,value in accum.items()})

并将结果写回文件：

def write_transactions(fname, transactions):
    with open(fname, "w") as outf:
        for section,accum in transactions:
            # start section
            outf.write("IN {}\n".format(section))
            # write key/value pairs in order by key
            keys = sorted(accum.keys())
            for key in keys:
                outf.write("    {}={}\n".format(key, accum[key]))
            # end section
            outf.write("//\n")

所有机器都已到位;我们只需要称之为：

def main():
    INPUT  = "transaction.txt"
    OUTPUT = "customer.diff"
    transactions = read_transactions(INPUT)
    cleaned_transactions = (format_section(section, accum) for section,accum in transactions)
    write_transactions(OUTPUT, cleaned_transactions)

if __name__=="__main__":
    main()

希望有所帮助！

更正一个文件中的错误并将其写入新文件

2 个答案: