嵌套的for循环迭代停止

时间:2014-11-05 15:19:58

标签: python for-loop iteration lxml

我有两个输入文件:一个html和一个css。我想根据css文件的内容在html文件上产生一些操作。

我的HTML是这样的:

<html>
 <head>
        <title></title>
    </head>
    <body>
    <p class = "cl1" id = "id1"> <span id = "span1"> blabla</span> </p>
    <p class = "cl2" id = "id2"> <span id = "span2"> blablabla</span> <span id = "span3"> qwqwqw </span> </p>
    </body>
    </html>

span id的样式在css文件中定义(单独为每个span id!)

在做真实的东西(根据他们的风格删除跨度)之前,我试图从html打印出id,并从每个id对应的css中打印出样式descritption。

代码:

from lxml import etree

tree = etree.parse("file.html")

filein = "file.css"


def f1():

    with open(filein, 'rU') as f:   
        for span in tree.iterfind('//span'):   
            for line in f:
                if span and span.attrib.has_key('id'):
                    x = span.get('id')
                    if "af" not in x and x in line:
                            print x, line
def main():
     f1() 

所以,有两个for循环,如果分开则迭代完美,但是当在这个函数中放在一起时,迭代在第一个循环之后停止:

>> span1 span`#span1 { font-weight: bold; font-size: 11.0pt; font-style: normal; letter-spacing: 0em } 

我该如何解决这个问题?

2 个答案:

答案 0 :(得分:1)

之所以发生这种情况,是因为您已经读取了所有文件,直到第二个外循环开始。 要使其工作,您需要在启动内部循环文件之前添加f.seek(0):

with open(filein, 'rU') as f:   
    for span in tree.iterfind('//span'):
        f.seek(0)   
        for line in f:
            if span and span.attrib.has_key('id'):
                x = span.get('id')
                if "af" not in x and x in line:
                        print x, line

答案 1 :(得分:1)

如果我认为,树已完全加载到内存中,您可以尝试反转循环。这样,您只需浏览文件filein一次:

def f1():

    with open(filein, 'rU') as f:   
        for line in f:
            for span in tree.iterfind('//span'):   
                if span and span.attrib.has_key('id'):
                    x = span.get('id')
                    if "af" not in x and x in line:
                            print x, line