python创建嵌套dicts列表

时间:2016-12-18 14:37:22

标签: python list dictionary beautifulsoup

我正在使用beautifulsoup来获取XML数据并将其放入一个dicts数组中。但是,它没有按预期工作。相同的dict只会添加到列表中。如何在嵌套for循环的正确阶段将正确的dict添加到列表中?

印刷的清单应如下所示:

[OrderedDict([('name', ‘dogs’), ('type', ‘housed’), ('value', ‘123’)]),
 OrderedDict([('name', ‘cats’), ('type', ‘wild’), ('value', ‘456’)]),
 OrderedDict([('name', ‘mice’), ('type', ‘housed’), ('value', ‘789’)])]

将它放在dict而不是列表中会更好吗?

Here is the XML:
<window>
    <window class="Obj" name="ray" type="housed">
        <animal name="dogs",  value = "123" />
        <species name="sdogs",  value = "s123" />
    </window>
    <window class="Obj" name="james" type="wild">
        <animal name="cats", type="wild", value = "456" />
        <species name="scats", type="swild", value = "s456" />
    </window>
    <window class="Obj" name="bob" type="housed">
        <animal name="mice",  value = "789" />
        <species name="smice",  value = "s789" />
    </window>
</window>

继承代码(对不起,如果有一些错误,我可以纠正它们,因为这是一个更大代码的例子):

import sys
import pprint
from bs4 import BeautifulSoup as bs
from collections import OrderedDict

soup = bs(open("test.xml"),"lxml")
dicty = OrderedDict()
listy = [];
Objs=soup.findAll('window',{"class":"Obj"})

#print Objs
for Obj in Objs:
    Objarr =  OrderedDict()     #### move this down
    #I want to add data to the array here:
    #print Obj
    for child in Obj.children:
        Objarr.update({"namesss" : Obj['name']})
        if child.name is not None:
            if child.name == "species":
                print Obj['name']
                print child['value']
                #Also, adding data to the array here:
                Objarr.update({"name" : Obj['name']})
                Objarr.update({"type" : Obj['type']})
                Objarr.update({"value": child['name']})
    listy.append(Objarr)        #### dedent this

pprint.pprint(listy)

2 个答案:

答案 0 :(得分:1)

您正在更新字典并将其附加到列表中。结果是您一次又一次地使用相同的字典。您应该在子循环开始之前创建一个新字典,并在循环之后添加,而不是在内部。

我猜是这样的:

import sys
import pprint
from bs4 import BeautifulSoup as bs
from collections import OrderedDict

soup = bs(open("my.xml"),"lxml")
dicty = OrderedDict()
listy = [];
Objs=soup.findAll('window',{"class":"Obj"})
#print Objs
for Obj in Objs:
    Objarr =  OrderedDict()        #### move this down ####
    #I want to add data to the array here:
    for child in Obj.children:
        if child.name is not None:
            if child.name == "variable":
               #Also, adding data to the array here:
                Objarr.update({"name" : Obj['text']})
                Objarr.update({"type" : " matrix”})
                Objarr.update({"value": child['name']})
    listy.append(Objarr)           #### dedent this ####

pprint.pprint(listy)

答案 1 :(得分:1)

请查看以下内容以了解您的objs包含的内容:

>>> soup = bs(open("my_xml.xml"),"lxml")
>>>
>>> objs = soup.findAll('window',{"class":"Obj"})
>>>
>>> for obj in objs:
...     for child in obj.children:
...         print child
...


<animal name="dogs" type="housed" value="123"></animal>


<animal name="cats" type="wild" value="456"></animal>


<animal name="mice" type="housed" value="789"></animal>


<window>
</window>

表示objs中的第一个元素是\n,最后一个元素是<window>\n</window>,并且在每个元素之间有一个\n,用于分隔每两个元素。< / p>

要解决此问题,您需要将listiteratorobj.children)转换为正常list,例如list(obj.children),然后将这些值用于列表切片: start: 1, end: -2, step: 2,与此list(obj.children)[1:-2:2]

一样

这是这种情况下的输出:

>>> for obj in objs:
...     for child in list(obj.children)[1:-2:2]:
...         print child
...
<animal name="dogs" type="housed" value="123"></animal>
<animal name="cats" type="wild" value="456"></animal>
<animal name="mice" type="housed" value="789"></animal>