关于查找类的Simple_DOM问题

时间:2011-05-21 23:56:01

标签: php html class object simpledom

我正在尝试进行简单的提取,但最终会产生不可预测的结果。

我有这个HTML代码

<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR>  Jim.

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap.

</div> 

</div> 

</div> 

我试图从“msgbody”中提取outertext,但只有在“profile”等于某事时才会提取。像这样。

$contents  = $html->find('.msgbody');
$elements = $html->find('.profile'); 

           $length = sizeof($contents);

           while($x != sizeof($elements)) {

            $var = $elements[$x]->outertext;

                        //If profile = the right name
            if ($var = $name) {

                                    $text = $contents[$x]->outertext;
                echo $text;

            }



            $x++;
         }    

我从错误的配置文件中获取文本,而不是具有我需要的关联的文本。 有没有办法用一行代码拉出所需的信息?

如果span-profile =“正确的名称”,那么就像 拉它的div-msgbody

2 个答案:

答案 0 :(得分:3)

好的,我将在这个问题上使用DOMXpath。我不确定“外文”是什么意思,但我会遵循这个要求:

  

如果span-profile =“正确名称”   然后拉它的div-msgbody

首先,这是我使用的缩小的HTML测试用例:

<html>
<body>
<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR>  Jim.

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap.

</div> 

</div> 

</div>
</body>
</html>

因此,我们将为此进行XPath查询。让我们展示整个事情,然后将其分解:

$messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']");

分解:

  

//跨度

     

给我跨度

     

//跨度[@类= '轮廓']

     

给我跨越课程的地方   简档

     

// span [@ class ='profile'和   含有(。, '$ PROFILE_NAME')]

     

给我跨越课程的地方   轮廓和跨度的内部   包含$profile_name,即   你以后的名字

     

// span [@ class ='profile'和   含有(。, '$ PROFILE_NAME')] /../

     

给我跨越课程的地方   轮廓和跨度的内部   包含$profile_name,即   你现在的名字上升到一个水平,   这让我们到<div class="message">

     

// span [@ class ='profile'和   含有(。, '$ PROFILE_NAME')] /../的div [@类= 'msgbody']

     

给我跨越课程的地方   轮廓和跨度的内部   包含$profile_name,即   你现在的名字上升到一个水平,   这让我们到<div class="message">,最后,给我   <div class="message">下的所有div   这个类是msgbody

现在,这是PHP代码的示例:

$doc = new DOMDocument();
$doc->loadHTMLFile("test.html");

$xpath = new DOMXpath($doc);
$profile_name = 'Lars Jörgenmeier';
$messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']");
foreach ($messages as $message) {
  echo trim("{$message->nodeValue}") . "\n";
}

XPath非常强大。我建议查看basic tutorial,然后您可以查看XPath standard是否要查看更多高级用法。

答案 1 :(得分:0)

这是一个简单的HTML DOM工作示例。

我更改了您的示例html,因此Suzy Creamcheese将有多个配置文件,如下所示:(file:test_class_class.htm)      

 <div class="message"> 
   <span class="profile">Suzy Creamcheese</span> 
   <span class="time">December 22, 2010 at 11:10 pm</span> 
   <div class="msgbody"> 
     <div class="subject">New digs</div> 
       Hello thank you for trying our soap. <BR>  Jim.
     </div> 
   </div> 

   <div class="message reply"> 
     <span class="profile">Lars Jörgenmeier</span> 
     <span class="time">December 22, 2010 at 11:45 pm</span> 
     <div class="msgbody"> 
       I never sold you any soap.
     </div> 
   </div> 
 </div>

 <div class="message"> 
   <span class="profile">Suzy Yogurt</span> 
   <span class="time">December 22, 2010 at 11:10 pm</span> 
   <div class="msgbody"> 
     <div class="subject">No Creamcheese</div> 
       This is not Suzy Creamcheese <BR>  Jim.
     </div> 
   </div> 

   <div class="message reply"> 
     <span class="profile">Suzy Creamcheese</span> 
     <span class="time">December 22, 2010 at 11:45 pm</span> 
     <div class="msgbody"> 
       A reply from Suzy Creamcheese.
     </div> 
   </div> 
 </div>

</div>

以下是使用Simple HTML DOM的测试:     包括( 'simple_html_dom.php');

function getMessage_for_profile($iUrl,$iProfile)
{
    // create HTML DOM
    $html = file_get_html($iUrl);

    // get text elements
    $aoProfile = $html->find('span[class=profile]'); 
    echo "Found ".count($aoProfile)." profiles.<br />";

    foreach ($aoProfile as $key=>$oProfile)
    {
      if ($oProfile->plaintext == $iProfile)
      {
        echo "<b>Profile ".$key.": ".$oProfile->plaintext."</b><br />";
// Using $e->next_sibling ()
        $oCurrent = $oProfile;
        while ($oNext = $oCurrent->next_sibling())
        {
           if ( $oNext->class == "msgbody" )
           {
             echo "<hr />";
             echo $oNext->outertext;
             echo "<hr />";
           }
           $oCurrent = $oNext;
        }
      }         
    }

    // clean up memory
    $html->clear();
    unset($html);

    return;
}
// --------------------------------------------
// test it!
// user_agent header...
ini_set('user_agent', 'My-Application/2.5');

getMessage_for_profile('test_class_class.htm','Suzy Creamcheese');
echo "<br /><br /><br />";
getMessage_for_profile('test_class_class.htm','Suzy Yogurt');

我的输出是:

Found 4 profiles.
Profile 0: Suzy Creamcheese
--------------------------------
New digs
Hello thank you for trying our soap.
Jim.
---------------------------------
Profile 3: Suzy Creamcheese
---------------------------------
A reply from Suzy Creamcheese.
---------------------------------



Found 4 profiles.
Profile 2: Suzy Yogurt
---------------------------------
No Creamcheese
This is not Suzy Creamcheese
Jim.
---------------------------------

看到它可以用Simple HTML DOM完成,因为我已经知道DOM是如何工作的......或者足以让我遇到麻烦......我不需要学习任何已知的语法!