按其第一个子元素的名称选择HTML元素

时间:2014-09-18 15:46:13

标签: perl

如果所有id元素的<div>属性都有<span>个孩子,我需要找到该值。

例如,给定此HTML

<div id="a1">                 <span> xa1 </span>       </div>
<div id="a2"> <p>...</p>      <span> xa2 </span>       </div>
<div id="a3">            <p>  <span> xa3 </span> </p>  </div>
<div id="a4"> <p>...</p>                             </div>

<div id="b1"> </div>          <span> xb1 </span>
<div id="b2"> </div> <p>      <span> xb1 </span> </p>
<div id="b3"> </div> <p>.</p> <span> xb3 </span>

我需要得到:a1,而不是更多。

因为CSS选择器没有类似 positive-lookahead 的东西,所以我需要逐步搜索HTML,但我不知道如何。

如何修改下一个来源,仅获取a1

use 5.014;
use warnings;

use Mojo::DOM;

my $html = do {local $/; <DATA>};

my $dom = Mojo::DOM->new($html);

for my $div ($dom->find('div')->each) {
   #say "DIV[[$div]]";
   my @spans = $div->find('div > span')->each;   #found a1 and a2 ;(
   say $div->attr('id') if (@spans == 1);
}

__DATA__
<div id="a1">                 <span> xa1 </span>       </div>
<div id="a2"> <p>...</p>      <span> xa2 </span>       </div>
<div id="a3">            <p>  <span> xa3 </span> </p>  </div>
<div id="a4"> <p>...</p>                             </div>

<div id="b1"> </div>          <span> xb1 </span>
<div id="b2"> </div> <p>      <span> xb1 </span> </p>
<div id="b3"> </div> <p>.</p> <span> xb3 </span>

<p id="p1">                <span> xp1 </span>       </p>
<p id="p2"> <p>...</p>     <span> xp2 </span>       </p>
<p id="p3">            <p> <span> xp3 </span> </p>  </p>
<p id="p4"> <p>...</p>                              </p>

3 个答案:

答案 0 :(得分:3)

遗憾的是Mojo::DOM不支持XPath表达式和CSS,因为它在前者中是非常自然的表达式。

您可能需要考虑切换到HTML::TreeBuilder::XPath。代码看起来像这样。它使用XPath表达式

//div[*][local-name(*[1])="span"]/@id

要求文档中任何id元素的div属性至少有一个孩子,并且第一个孩子的本地名称为span

use strict;
use warnings;
use 5.014;

use HTML::TreeBuilder::XPath;

my $tree = do {
   local $/;
   HTML::TreeBuilder::XPath->new_from_content(<DATA>);
};

say for $tree->findvalues('//div[*][local-name(*[1])="span"]/@id');

__DATA__
<html><body>
<div id="a1">                 <span> xa1 </span>       </div>
<div id="a2"> <p>...</p>      <span> xa2 </span>       </div>
<div id="a3">            <p>  <span> xa3 </span> </p>  </div>
<div id="a4"> <p>...</p>                             </div>

<div id="b1"> </div>          <span> xb1 </span>
<div id="b2"> </div> <p>      <span> xb1 </span> </p>
<div id="b3"> </div> <p>.</p> <span> xb3 </span>

<p id="p1">                <span> xp1 </span>       </p>
<p id="p2"> <p>...</p>     <span> xp2 </span>       </p>
<p id="p3">            <p> <span> xp3 </span> </p>  </p>
<p id="p4"> <p>...</p>                              </p>
</body></html>

<强>输出

a1

答案 1 :(得分:3)

您可以使用css样式选择器和Mojo :: DOM的parent方法以稍微迂回的方式获取您正在寻找的元素:

use strict;
use warnings;
use feature ":5.10";
use Mojo::DOM;

my $html = do{ local $/; <DATA>};

my $dom = Mojo::DOM->new($html);

# searches for div elements with spans as the first child
for my $div ( $dom->find('div > span:first-child')->parent->each ) {
    say "id: " . $div->attr('id') if $div->attr('id');
}

__DATA__
<div id="a1">                 <span> xa1 </span>       </div>
<div id="a2"> <p>...</p>      <span> xa2 </span>       </div>
<div id="a3">            <p>  <span> xa3 </span> </p>  </div>
<div id="a4"> <p>...</p>                             </div>

<div id="b1"> </div>          <span> xb1 </span>
<div id="b2"> </div> <p>      <span> xb1 </span> </p>
<div id="b3"> </div> <p>.</p> <span> xb3 </span>

<p id="p1">                <span> xp1 </span>       </p>
<p id="p2"> <p>...</p>     <span> xp2 </span>       </p>
<p id="p3">            <p> <span> xp3 </span> </p>  </p>
<p id="p4"> <p>...</p>                              </p>

输出:

id: a1

或者如果你知道它只是你想要的第一个这样的div,那么下面的方法就可以了:

say "id: " . $dom->at('div > span:first-child')->parent->attr('id');

答案 2 :(得分:0)

要么:

my @spans = $div->find('div > span:first-child')->each;
say $div->attr('id') if (@spans == 1);

或者这个:

my @kids = $div->children;
say $div->attr('id') if @kids and $kids[0]->type eq 'span';