模式匹配文件中的文本?

时间:2017-04-25 23:34:39

标签: python pattern-matching

我有一个输入文件,如下所示 input file link

并且需要创建一个如下所示的输出文件 output file link

我从这开始,但错误处理和模式匹配正在弄乱逻辑(特别是在URL和数据中出现:)。此外,输出文件中的平均值是非零值或非零值的平均值

with open("input.txt") as f:
 next(f) # skips header
 for line in f:

  cleanline = re.sub('::',':',line) # handles the two :: case
  newline = re.split("[\t:]",cleanline) #splits on either tab or :
  print newline
  x=0
  total=0
    for i in range(3,7):
     if newline[i] <> 0 or newline[i] != None:
      x+=1
      total+=total
      avg=total/x
      print avg

1 个答案:

答案 0 :(得分:0)

我建议你从不同的角度来看待这个问题。首先,沿着选项卡拆分每一行,然后单独验证每个条目。这允许您为每个条目编译正则表达式并编译更精确的错误消息。一个很好的方法是使用元组解包和拆分方法:

from __future__ import print_function

with open("input.txt") as in_file, open("output.txt", 'w') as out_file:
    next(in_file) # skips header

    for line in in_file:
        error_message = []
        # remove line break character and split along the tabs
        id_and_date, user_id, p1, p2, p3, p4, url = line.strip("\n").split("\t")

        # split the first entry at the first :
        split_id_date = id_and_date.split(":", 1)
        if len(split_id_date) == 2:
            order_id, date = split_id_date
        elif len(split_id_date) == 1:
            # assume this is the order id
            # or do something 
            order_id, date = (split_id_date[0], "")
            error_message.append("Invalid Date") 
        else:
            # set default values if nothing is present
            order_id, date = ("", "")
        # validate order_id and date here using re.match
        # add errors to error_message list:
        # error_message.append("Invalid Date") 

        # calculate average price
        # first, compile a list of the non-zero prices
        nonzero_prices = [int(x) for x in (p1, p2, p3, p4) if int(x) > 0] # this can be done more efficient
        # compute the average price
        avg_price = sum(nonzero_prices) / len(nonzero_prices)

        # validate url using re here
        # handle errors as above

        print("\t".join([order_id, date, user_id, str(avg_price), url, ", ".join(error_message)]), file=out_file)

我没有添加re调用来验证条目,因为我不知道您希望在条目中看到什么。但是,我添加了一条评论,其中对re.match或类似内容的调用是合理的。

我希望这会有所帮助。