使用xpath查询选择深层嵌套的链接

时间:2011-04-24 03:38:38

标签: php xpath href

<body class="en-us">   <div id="wrapper">
    <div id="content">
      <div class="content-top">
        <div class="content-bot">
          <div id="profile-wrapper" class=
          "profile-wrapper profile-wrapper-horde">
            <div class="profile-sidebar-anchor">
              <div class="profile-sidebar-outer">
                <div class="profile-sidebar-inner">
                  <div class="profile-sidebar-contents">
                    <div class="profile-sidebar-crest">
                      <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
                      </a>

                      <div class="profile-sidebar-info">
                        <div class="name">
                          <a href="/wow/en/character/some-server/sometoon/"
                          rel="np">Glitchshot</a>
                        </div>

                        <div class="under-name color-c8">
                          <span class="level"><strong>85</strong></span>
                          <a href="/wow/en/game/race/somerace" class="race">somerace</a> 
                          <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                        </div>

                        <div class="guild">
                          <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                          Some Guild</a>
                        </div>

                        <div class="realm">
                          <span id="profile-info-realm" class="tip"
                          data-battlegroup="Stormstrike">Black
                          Dragonflight</span>
                        </div>
                      </div>
                    </div>

                    <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                      <li><a href=
                      "/wow/en/character/some-server/sometoon/" class=
                      "back-to" rel="np"><span class="arrow"><span class=
                      "icon">Character Summary</span></span></a></li>

                      <li class="root-menu"><a href=
                      "/wow/en/character/some-server/sometoon/achievement"
                         class="back-to" rel="np"><span class=
                         "arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=" active"><a href=
                      "/wow/en/character/some-server/sometoon/achievement#summary"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=""><a href=
                      "/wow/en/character/some-server/sometoon/achievement#92"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">General</span></span></a></li>

我知道我在这里发布了很多无用的代码,但是希望你们知道DOM会是什么样子。

由此:

<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>

我想提取一下:

/wow/en/character/some-server/sometoon/achievement#92

来自发布标记中的最后一个锚点。

我已经阅读了尽可能多地了解如何使用xpath查询来提取所需信息,但我显然遗漏了一些东西。以下是我认为应该有效的查询,但不是。

<?php
    $query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
    echo $query . "<br>";
    $achievementSubCategory = $xpath->query($query);

    $achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
    var_dump($achiSubArray);
    // Produces array(1) { ["URL"]=> NULL } which should look something more like:
    // array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>

提前感谢您的帮助和建议

3 个答案:

答案 0 :(得分:1)

*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href

此XPath表达式存在一些问题

  1. 它正在寻找一个ul元素,它是当前节点的一个crandchild,并且有一个名为class的属性,其字符串值等于其中一个的字符串值子元素ul的元素,名为profile-sidebar-menu。但是,ul没有名为profile-sidebar-menu的子项,整个表达式不会选择任何节点。

  2. 另一个问题是索引。 li[3]选择第三个li元素 - 上下文节点的子元素。但是,有用的a元素是上下文节点的第四个 li子元素的子元素。这必须表示为:li[4]。 XPath位置是从1开始的,而不是从0开始的。

  3. 如果纠正了这两个问题,我相信更正的表达式应如下所示

    */ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href
    

    从提供的XML文档的顶部元素href开始,选择所需body属性的绝对XPath表达式是:

    /*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href
    

    下面是XML文档(提供的文档,通过附加一些缺少的结束标记而形成良好的文档:

    <body class="en-us">
        <div id="wrapper">
            <div id="content">
                <div class="content-top">
                    <div class="content-bot">
                        <div id="profile-wrapper" class=
                  "profile-wrapper profile-wrapper-horde">
                            <div class="profile-sidebar-anchor">
                                <div class="profile-sidebar-outer">
                                    <div class="profile-sidebar-inner">
                                        <div class="profile-sidebar-contents">
                                            <div class="profile-sidebar-crest">
                                                <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
                                                <div class="profile-sidebar-info">
                                                    <div class="name">
                                                        <a href="/wow/en/character/some-server/sometoon/"
                                  rel="np">Glitchshot</a>
                                                    </div>
                                                    <div class="under-name color-c8">
                                                        <span class="level">
                                                            <strong>85</strong>
                                                        </span>
                                                        <a href="/wow/en/game/race/somerace" class="race">somerace</a>
                                                        <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                                                    </div>
                                                    <div class="guild">
                                                        <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                                  Some Guild</a>
                                                    </div>
                                                    <div class="realm">
                                                        <span id="profile-info-realm" class="tip"
                                  data-battlegroup="Stormstrike">Black
                                  Dragonflight</span>
                                                    </div>
                                                </div>
                                            </div>
                                            <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                                                <li>
                                                    <a href=
                              "/wow/en/character/some-server/sometoon/" class=
                              "back-to" rel="np">
                                                        <span class="arrow">
                                                            <span class=
                              "icon">Character Summary</span></span>
                                                    </a>
                                                </li>
                                                <li class="root-menu">
                                                    <a href=
                              "/wow/en/character/some-server/sometoon/achievement"
                                 class="back-to" rel="np">
                                                        <span class=
                                 "arrow">
                                                            <span class=
                                 "icon">Achievements</span></span>
                                                    </a>
                                                </li>
                                                <li class=" active">
                                                    <a href=
                              "/wow/en/character/some-server/sometoon/achievement#summary"
                                 class="" rel="np">
                                                        <span class="arrow">
                                                            <span class=
                                 "icon">Achievements</span></span>
                                                    </a>
                                                </li>
                                                <li class="">
                                                    <a href=
                              "/wow/en/character/some-server/sometoon/achievement#92"
                                 class="" rel="np">
                                                        <span class="arrow">
                                                            <span class=
                                 "icon">General</span></span>
                                                    </a>
                                                </li>
                                            </ul>
                                        </div>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </body>
    

    通过使用the Xpath Visualizer等工具对其进行评估,可以检查上述绝对XPath表达式是否精确选择了所需的href属性。

    以下是使用XPath Visualizer执行的选择快照:

    enter image description here

答案 1 :(得分:0)

如果您的DOM结构是一致的,那么类似下面的内容应该有效:

//ul[@class='profile-sidebar-menu']/li[last()]/a/@href

你的xpath语句毫无意义。路径中有多个ul,但样本的结构不是这样的。此外,xpath中的索引从1开始,而不是0。

答案 2 :(得分:0)

在你上面显示的html的基础上(并假设最终的标签正确关闭),ewh'表达应该可以正常工作。

可能是你在那里省略了文件的一些重要部分。尝试更具体:

//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href

我很确定它有效,通过XPath Query Expression Tool在线测试。

如果仍然没有得到结果,请尝试显示您正在处理的所有HTML。