pyquery(lxml)没有在结构良好的XML文档中找到标记?

时间:2016-08-24 16:46:10

标签: python lxml pyquery

我有一个类似this的XML文件。相关的是这个:

<reference>
  <citation>Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin Psychopharmacol. 2012 Jun;32(3):341-5. doi: 10.1097/JCP.0b013e318254239b.</citation>
  <PMID>22544016</PMID>
</reference>

我试图找到PMID字段的值,使用PyQuery来解析XML:

    from pyquery import PyQuery as pq

    text = open(f, 'r').read()
    d = pq(text)
    data = {}       
    data['nct_id'] = d('nct_id').text()

    print d('reference')
    reference = d('reference')
    print reference('PMID')
    data['pmid'] = reference('PMID').text()

    print data['PMID']

为什么这不起作用?在控制台中,我从第一个print语句中看到reference的完整内容,后跟两个空值:

<reference>
    <citation>Vander Wal JS, Gang CH, Griffing GT, Gadde KM. Escitalopram for treatment of night eating syndrome: a 12-week, randomized, placebo-controlled trial. J Clin Psychopharmacol. 2012 Jun;32(3):341-5. doi: 10.1097/JCP.0b013e318254239b.</citation>
    <PMID>22544016</PMID>
  </reference>

我可以使用nct_id在文档中找到其他叶节点(如.find()),如示例代码所示。

是不是PyQuery不喜欢大写标签?

1 个答案:

答案 0 :(得分:1)

您可以指定要使用的解析器,它将起作用:

d = pq(text, parser='xml')