Python循环利用next()跳过线

时间:2017-12-03 14:01:27

标签: python python-2.7

我有一个用于解析文本文件的脚本 该脚本中有一个While循环,因为它们可能是多个下一行。 我当前的脚本遇到了跳线问题。我很确定它与我使用“next()”及其位置有关,但我无法弄明白。
这是文本文件的示例:

object-group network TestNetwork1
 description TestDescription
 network-object host TestHost
 network-object host TestHost
 network-object host TestHost
 network-object host TestHost
object-group network TestNetwork2
 description TestDescription
 network-object host TestHost
object-group network TestNetwork3
 description TestDescription
 network-object host TestHost
object-group network TestNetwork4
 description TestDescription
 network-object host TestHost
object-group network TestNetwork5
 description TestDescription
 network-object host TestHost
object-group network TestNetwork6
 description TestDescription
 network-object host TestHost
object-group network TestNetwork7
 description TestDescription
 network-object host TestHost
object-group network TestNetwork8
 description TestDescription
 network-object host TestHost
object-group network TestNetwork9
 description TestDescription
 network-object host TestHost
object-group network TestNetwork10s
 description TestDescription
 network-object host TestHost

这是脚本:

    import csv
Count = 0
objects = open("test-object-groups.txt", 'r+')
iobjects = iter(objects)

with open('object-group-test.csv', 'wb+') as filename2:
    writer2 = csv.writer(filename2)
    for lines in iobjects:
        if lines.startswith("object-group network"):
            print lines
            Count += 1
            linesplit = lines.split()
            writer2.writerow([linesplit[2]])
            while True:
                nextline = str(next(iobjects))
                if nextline.startswith(" network-object") or nextline.startswith(" description"):
                    nextlinesplit = nextline.split()
                    if nextlinesplit[1] <> "host" and nextlinesplit[1] <> "object" and nextlinesplit[0] <> "description":
                        writer2.writerow(['','subnet', nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[1] == "host":
                        writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[1] == "object":
                        writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[0] == "description":
                        writer2.writerow(['',nextlinesplit[0]])

                elif nextline.startswith("object-group"):
                    break

print Count

以下输出显示它正在跳过行:

object-group network TestNetwork1

object-group network TestNetwork3

object-group network TestNetwork5

object-group network TestNetwork7

object-group network TestNetwork9

5

如上所示,订单项正在跳过 知道如何解决这个问题吗?

1 个答案:

答案 0 :(得分:2)

for lines in iobjects:
    ...
    ...
    while True:
        nextline = str(next(iobjects))

当然会跳过一条线。您在迭代next(iobjects)时调用iobjects,因此下一行被消耗,而不是由for循环处理。

考虑这个文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

这段代码:

with open('test.txt') as f:
    for line in f:
        print(line)
        if int(line.strip()) % 2 == 0:
            next(f)

输出结果为:

1

2

4

6

8

10

12

14

如果数字是偶数,我们会调用next,因此每隔一行就会丢失。

建议的解决方案:

  1. 使用itertools.tee创建2个不同的生成器。可能是最不直接的解决方案。

  2. 使用f.readlines()并操作文件中的行列表而不是迭代器。这样您就可以使用索引。

  3. 使用创建“peekable”迭代器的more-itertools包:https://stackoverflow.com/a/27698681/1453822

  4. 不要逐行解析文件。使用正则表达式逐块提取文件中的信息。例如,正则表达式r'(object-group.*?)(?=$|object-group)'会这样做。 (我确信这远不是最优的正则表达式)。确保使用re.DOTALL标志。

    import re
    
    with open('test.txt') as f:
        file_content = f.read()
    
    for group in re.findall(r'(object-group.*?)(?=$|object-group)', file_content, re.DOTALL):
        print(group)
    
    # object-group network TestNetwork1
    #  description TestDescription
    #  network-object host TestHost
    #  network-object host TestHost
    #  network-object host TestHost
    #  network-object host TestHost
    # 
    # object-group network TestNetwork2
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork3
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork4
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork5
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork6
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork7
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork8
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork9
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork10s
    #  description TestDescription
    #  network-object host TestHost
    

  5. 作为旁注iobjects = iter(objects)是多余的。 open已经返回迭代器。