xpath选择没有childnode结果的文本

时间:2015-09-15 06:17:15

标签: xpath scrapy

html:

<h2 class="reward__pledge-amount">
Pledge $1 or more
    <div class="reward__currency-conversion">
        <h5 class="regular grey-dark">
             About <span>$1.00 USD</span>
        </h5>
    </div>
</h2>
<p class="reward__backer-count">
    <span class="ksr-icon__backer-badge"></span>
    2 backers
</p>

scrapy shell:

sites = sel.css(".reward__info")
for site in sites:
    a = site.xpath("./h2[@class='reward__pledge-amount']/text()").extract()
    b = site.xpath("./p[@class='reward__backer-count']/text()").extract()
    print a
    print b
    break

结果:

[u'\nPledge $1 or more\n', u'\n']
[u'\n', u'\n2 backers\n']    

如您所见,text()会返回一个列表 我认为这是因为<h2>中有<div><p><span>
如何在没有子节点文本的情况下获取text()<h2>下的<p> ???

像:

[u'\nPledge $1 or more\n']
[u'\n2 backers\n']    

1 个答案:

答案 0 :(得分:0)

您可以尝试在normalize-space()的XPath谓词中使用text()来过滤掉空文本节点,例如:

a = site.xpath("./h2[@class='reward__pledge-amount']/text()[normalize-space()]").extract()
b = site.xpath("./p[@class='reward__backer-count']/text()[normalize-space()]").extract()