我选择的Python片段重复行

时间:2016-06-30 15:08:09

标签: python

我想从几个文件中抓取指定的部分。

我的所有文件的结构如下,但对象数据不同:

some lines
ObjectAlias apple
some lines
Begin
some lines
End
some lines
ObjectAlias pear
some lines
Begin
some lines
End
...

假设我有一个我的文件列表,一个指定的列表" ObjectAlias"模式和功能:

def dummyFunc (fileList, objList):
    dummy = ""

    for file in fileList:
        with open(file, "r") as infile:
            Tag = False
            for line in infile:
                for obj in objList:
                    if line.find("ObjectAlias " + obj + "\n") !=-1:
                        Tag = True
                    if Tag:
                        dummy += line
                    if line.find("End") != -1:
                        Tag = False
    return (dummy)

这段代码给我这样的结果:

...
ObjectAlias cherry
ObjectAlias cherry
ObjectAlias cherry
ObjectAlias cherry
Begin
Begin
Begin
Begin
same lines
same lines
same lines
same lines
End
...

这就是我的预期:

...
ObjectAlias apple
some lines
Begin
some lines
End
ObjectAlias cherry
some lines
Begin
some lines
End
...

我的代码做错了什么? 在我测试时,类似的代码适用于单个文件和单个对象,但是当使用list作为输入时不起作用,如果objList有5个项目,结果将是每行5x。

欢迎任何帮助。

编辑:更清楚地解释

FILEA:

some lines
ObjectAlias apple
some lines
Begin
some lines about apple
End
some lines
ObjectAlias pear
some lines
Begin
some lines about pear
End
some lines
ObjectAlias orange
some lines
Begin
some lines about orange
End
some lines

FILEB:

some lines
ObjectAlias lemon
some lines
Begin
some lines about lemon
End
some lines
ObjectAlias peach
some lines
Begin
some lines about peach
End
some lines
ObjectAlias tomato
some lines
Begin
some lines about tomato
End
some lines

想要将梨和桃过滤到新文件中,所以它是:

ObjectAlias pear
Begin
some lines about pear
End
ObjectAlias peach
Begin
some lines about peach
End

通过iownthegame的帮助,我将代码修改为:

def dummyFunc (fileList, objList):
    dummy = ""

    for file in fileList:
        with open(file, "r") as infile:
            Tag = False
            for line in infile:
                for obj in objList:
                    objString = "ObjectAlias " + obj
                    if objString in line:
                        dummy += line
                        break
                    elif "Begin" in line:
                        Tag = True
                        break
                    elif "End" in line:
                        dummy += line
                        Tag = False
                        break
                if Tag:
                    dummy += line
    return (dummy)

income = ["e:/FileA", "e:/FileB"]
filter = ["pear", "peach"]
with open("e:/Result", "w") as f:
    f.write(dummyFunc(income, filter))

但我得到了这个输出:

Begin
some lines about apple
End
ObjectAlias pear
Begin
some lines about pear
End
Begin
some lines about orange
End
Begin
some lines about lemon
End
ObjectAlias peach
Begin
some lines about peach
End
Begin
some lines about tomato
End

我绝对是初学者,我做错了什么?谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

当你在每个terminate找到break而不是继续查找和连接输出时,你应该有一个obj

更新,因为您需要过滤掉ObjectAlias和Begin之间的行,因此您还需要一个state来记录处理过程中所执行的操作。

我修改标记以记录不同的状态,使用无标记初始化,一旦符合 ObjectAlias 行,转到start标记,一旦符合开始行,转到begin标记,一旦符合结束行,转到无标记。因此,如果您遇到的某些行早于begin Tag,则不会打印它们。另一种情况是,当我们在 ObjectAlias 行开始之前遇到开始行或结束行时,我们也不会计算。

希望这个解决方案可能会有所帮助。

def dummyFunc (fileList, objList):
    dummy = ""

    for file in fileList:
        with open(file, "r") as infile:
            Tag = None
            for line in infile:
                for obj in objList:
                    objString = "ObjectAlias " + obj
                    if objString in line:
                        Tag = "start"
                        dummy += line
                        break
                    elif "Begin" in line:
                        if Tag == "start":
                                Tag = "begin"
                        break
                    elif "End" in line:
                        if Tag == "begin":
                                Tag = None
                                dummy += line
                        break
                if Tag == "begin":
                    dummy += line
    return (dummy)
相关问题