从某个标签获取文本

时间:2011-01-06 23:43:20

标签: php html html-parsing

有没有办法从页面中的某个<tr>标记动态获取文字?

e.g。我有一个<tr>的页面,其值为“a1”。我想只获取此<tr>标记中的文本,并将其回显到页面中。这有可能吗?

这是HTML:

<html><tr  id='ieconn2' >
  <td><table width='100%'><tr><td valign='top'><table width='100%'><tr><td><script type="text/javascript"><!--
google_ad_client = "pub-4503439170693445";
/* 300x250, created 7/21/10 */
google_ad_slot = "7608120147";
google_ad_width = 300;
google_ad_height = 250;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><br>When Marshall and Lily fear they will never get pregnant, they see a specialist who can hopefully help move the process along. Meanwhile, Robin starts her new job.<br><br><b>Source: </b>CBS

<br>&nbsp;</td></tr><tr><td><b>There are no foreign summaries for this episode:</b> <a href='/edit/shows/3918/episode_foreign_summary/?eid=1065002553&season=6'>Contribute</a></td></tr><tr><td><b>English Recap Available: </b> <a href='/How_I_Met_Your_Mother/episodes/1065002553?show_recap=1'>View Here</a></td></tr></table></td><td valign='top' width='250'><div align='left'>
<img  alt='How I Met Your Mother season 6 episode 13' src="http://images.tvrage.com/screencaps/20/3918/1065002553.jpg" width="248"  border='0' >
</div><div align='center'><a href='/How_I_Met_Your_Mother/episodes/1065002553?gallery=1'>6 gallery images</a></div></td></tr></table></td></tr><tr>
  <td background='/_layout_v3/buttons/title.jpg' height='39' width='631' align='center'>
<table width='100%' cellpadding='0' cellspacing='0' style='margin: 1px 1px 1px 1px;'>
<tr>
<td align='left'  style='cursor: pointer;' onclick="SwitchHeader('ieconn3','iehide3','26')"  width='90'>&nbsp;<span style='font-size: 15px;   font-weight: bold; color: black; padding-left: 8px;' id='iehide3'><img src='/_layout_v3/misc/minus.gif' width='26'></span></td>
<td align='center'  style='cursor: pointer;' onclick="SwitchHeader('ieconn3','iehide3','26')" ><h5 class='nospace'>Sponsored Links</h5><a name=''></a></td>

<td align='left' width='90' >&nbsp;</td></tr></table></td>
</tr></html>

我想得到的只是这样的文字:“当马歇尔和莉莉担心他们永远不会怀孕时,他们会看到一位能够帮助推动这一过程的专家。同时,罗宾开始了她的新工作。”

3 个答案:

答案 0 :(得分:3)

这个怎么样?

$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile(...); 
libxml_clear_errors();

$xpath = new DomXpath($dom);
$nodes = $xpath->query('/html/body/tr/td/table/tr/td/table/tr/td');
foreach ($nodes as $node)
{
  echo $node->nodeValue, "\n";
}

答案 1 :(得分:2)

如果我假设你想做什么,你可以做到以下几点:

$url = “http://url.tld”;
$str = file_get_contents($url);

从那里开始只使用php的字符串函数来删除你不喜欢的部分(可能生成一个正则表达式来加速这个过程)。

如果上述方法不起作用,您可以尝试更复杂的功能:

function get_url_contents($url){
    $crl = curl_init();
    $timeout = 5;
    curl_setopt ($crl, CURLOPT_URL,$url);
    curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
    $ret = curl_exec($crl);
    curl_close($crl);
    return $ret;
}

答案 2 :(得分:1)

使用queryPath http://querypath.org/。这是一个用于php的jQuery。

相关问题