根据范围内的数据从URL获取数据

时间:2018-10-03 13:15:25

标签: php curl

我正在尝试从URL获取数据,并且仅从具有title =“”的范围内检索数据 每个“行”数据都有一个跨度,跨度具有不同的标题增量值,例如

title="1", title="2"

所以我要获取的数据将在此范围内 资料在这里 x将是一个递增数字

我能够使用此代码从页面中获取所有数据,但是我仍然坚持如何实现自己的需求

function file_get_contents_curl($url)
{
$ch = curl_init();

curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$data = curl_exec($ch);
curl_close($ch);

return $data;
}
$html = file_get_contents_curl("http://www.example.com");
//parsing all content:
$doc = new DOMDocument();
@$doc->loadHTML($html);
echo "$html";

数据格式如下:

<span id="RANDOMINFO">
 <a href="/DEMO/RANDOMDATA">+</a>
 <span title="1">DATA I WANT HERE</span> 
<a href="https://URL.COM/RANDOM">CLICK</a> 
<a href="https://URL.COM/RANDOM">RANDOM DATA</a>
</span>
<span id="RANDOMINFO">
 <a href="/DEMO/RANDOMDATA">+</a>
 <span title="2">DATA I WANT HERE</span> 
<a href="https://URL.COM/RANDOM">CLICK</a> 
<a href="https://URL.COM/RANDOM">RANDOM DATA</a>
</span>

1 个答案:

答案 0 :(得分:0)

解决方案: 可以在提供的代码中以注释的形式进行解释

$doc = new DOMDocument();
@$doc->loadHTML($html);

foreach($doc->getElementsByTagName('span') as $element ) { //Loops through all available span elements
    if (empty($element->attributes->getNamedItem('id')->value) || $element->attributes->getNamedItem('id')->value != 'RANDOMINFO') { // Discards irrelevant span elements based on their `ID`. A similar sorting is achieved with `empty()` as the target `span` doesn't have any associated `ID`.
        echo get_inner_html($element).PHP_EOL; 
    }
}


function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;

    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveHTML( $child ); //fetches the text inside child elements of the targeted element
    }

    return $innerHTML;
}

输出:

DATA I WANT HERE
DATA I WANT HERE

参考文献:

  1. DOMDocument::getElementsByTagName
  2. DOMNamedNodeMap::getNamedItem
  3. DOMDocument::saveHTML
相关问题