Question

我用这个php代码获取页面内容：但我不知道为什么服务器使用旧浏览器版本访问此页面

$url = $target_domain . 'http://www.facebook.com';
//Download page
$site = file_get_contents($url);
$dom = DOMDocument::loadHTML($site);

if($dom instanceof DOMDocument) {
    // find <head> tag
    $head_tag_list = $dom->getElementsByTagName('head');
    // there should only be one <head> tag
    if($head_tag_list->length !== 1) {
        throw new Exception('Wow! The HTML is malformed without single head tag.');
    }
    $head_tag = $head_tag_list->item(0);

    // find first child of head tag to later use in insertion
    $head_has_children = $head_tag->hasChildNodes();
    if($head_has_children) {
        $head_tag_first_child = $head_tag->firstChild;
    }

    // create new <base> tag
    $base_element = $dom->createElement('base');
    $base_element->setAttribute('href', $target_domain);

    // insert new base tag as first child to head tag
    if($head_has_children) {
        $base_node = $head_tag->insertBefore($base_element, $head_tag_first_child);
    } else {
        $base_node = $head_tag->appendChild($base_element);
    }

    echo $dom->saveHTML();
} else {
    // something went wrong in loading HTML to DOM Document
    // provide error messaging
}
?>

看看照片： “;

请帮助我了解如何使用新版浏览器访问。

Answer 1

Instead of using file_get_contents which is little poor, you can use the Curl which we can add useful parameters, in your case you should indicate a new UserAgent in the request, try this this simple exemple :

function curl_download($Url) {

    // is cURL installed yet?
    if (!function_exists('curl_init')) {
        die('Sorry cURL is not installed!');
    }

    // OK cool - then let's create a new cURL resource handle
    $ch = curl_init();

    // Now set some options (most are optional)
    // Set URL to download
    curl_setopt($ch, CURLOPT_URL, $Url);


    // User agent
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36");




    // Include header in result? (0 = yes, 1 = no)
    curl_setopt($ch, CURLOPT_HEADER, 0);

    // Should cURL return or print out the data? (true = return, false = print)
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);

    // Timeout in seconds
    curl_setopt($ch, CURLOPT_TIMEOUT, 15);

    // Download the given URL, and return output
    $output = curl_exec($ch);

    // Close the cURL resource, and free system resources
    curl_close($ch);

    return $output;
}

php获取文件内容浏览器

1 个答案: