XPath根据兄弟/堂兄文本选择元素?

时间:2017-12-18 23:19:16

标签: html xml xpath scrapy

我正在尝试抓取Chief ExecutiveSomeone Else

的详细联系信息

我可以使用以下代码找到行政长官

response.xpath('*/div[@class="outer"]/h2/text()="Chief Executive"')

# Returns a Selector
[<Selector xpath='*/div[@class="outer"]/h2/text()="Chief Executive"' data=u'0'>]

但是一旦我尝试访问父母或兄弟姐妹,我就会收到错误或没有数据。

以下是我尝试过的一些模式

1

response.xpath('*/div[@class="outer"]/h2/text()="Chief Executive"/following-sibling')

ValueError: XPath error: Invalid type in */div[@class="outer"]/h2/text()="Chief Executive"/following-sibling

2

response.xpath('*/div[@class="outer"]/h2/text()="Chief Executive"/following-sibling::content')

ValueError: XPath error: Invalid type in */div[@class="outer"]/h2/text()="Chief Executive"/following-sibling::content

3

response.xpath('*/div[@class="outer"]/h2/text()="Chief Executive"/parent::*')

ValueError: XPath error: Invalid type in */div[@class="outer"]/h2/text()="Chief Executive"/parent::*

4

response.xpath('*/div[@class="outer"]/h2/text()="Chief Executive"/..')

ValueError: XPath error: Invalid type in */div[@class="outer"]/h2/text()="Chief Executive"/..

5

response.xpath('*/div[@class="outer"]/h2[.="Chief Executive"]')

[] # No data found

6

response.xpath('*/div[@class="outer"]/h2[text()="Chief Executive"]')

[] # No data found

基础HTML

<div class="outer">
    <h2 class="legend">
    Chief Executive
    </h2>

    <div class="fieldset">

    <div class="display-row">
        <div class="display-label">Contact name:</div>
        <div class="display-field-no-width">
        Mr. Steven Bob
        </div>
    </div>

    <div class="display-row">
        <div class="display-label">Job title:</div>
        <div class="display-field-no-width">
        Chief Executive Officer
        </div>
    </div>

    <div class="display-row">
        <div class="display-label">Organisation name:</div>
        <div class="display-field-no-width">
        1 COMAPNY PTY LTD
        </div>
    </div>

    </div>
</div>


<div class="outer">
    <h2 class="legend">
    Someone Else
    </h2>

    <div class="fieldset">

    <div class="display-row">
        <div class="display-label">Contact name:</div>
        <div class="display-field-no-width">
        Mr. Steven Bob
        </div>
    </div>

    <div class="display-row">
        <div class="display-label">Job title:</div>
        <div class="display-field-no-width">
        Chief Executive Officer
        </div>
    </div>

    <div class="display-row">
        <div class="display-label">Organisation name:</div>
        <div class="display-field-no-width">
        1 COMAPNY PTY LTD
        </div>
    </div>

    </div>
</div>

1 个答案:

答案 0 :(得分:3)

这个XPath,

normalize-space(//div[normalize-space(h2)='Chief Executive']
                /div[div[1]='Contact name:'])

将返回

Mr. Steven Bob

按照此模式,您可以根据要求从此条目或Someone Else条目中选择其他字段。