检查URL时cURL奇怪的状态代码

时间:2012-11-29 15:38:04

标签: php curl

我正在检查不同网址上是否存在xml站点地图。如果我提供了一个URL example.com/sitemap.xml,它有一个301到www.example.com/sitemap.xml,我显然得到301。如果www.example.com/sitemap.xml不存在,我将看不到404.所以,如果我得到301,我会执行另一个cURL以查看404是否返回www.example.com/sitemap.xml。但是,由于理由,我得到随机的404和303状态代码。

private function check_http_status($domain,$file){

        $url = $domain . "/" . $file;

        $curl = new Curl();

        $curl->url = $url;
        $curl->nobody = true;
        $curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
        $curl->execute();
        $retcode = $curl->httpCode();

        if ($retcode == 301 || $retcode == 302){

            $url = "www." . $domain . "/" . $file;

            $curl = new Curl();
            $curl->url = $url;
            $curl->nobody = true;
            $curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
            $curl->execute();
            $retcode = $curl->httpCode();

        }

        return $retcode;

    }

3 个答案:

答案 0 :(得分:2)

查看返回的回复代码列表 - http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

通常,Web浏览器会自动处理这些内容,但是当您使用curl手动执行操作时,您需要了解每个响应的含义。 301302表示您应使用提供的替代网址来访问资源。对于请求,这可能是一个简单的addin www,但是当它重定向到另一个域altogather时也可能更复杂。

303表示您正在使用POST尝试访问该资源,并应使用GET

答案 1 :(得分:0)

好吧,当你收到301或302时,你应该使用在回复中找到的位置,而不仅仅是假设另一个位置并尝试。

正如您在此示例中所看到的,来自服务器的响应包含文件的新位置。用于下一个请求: http://en.wikipedia.org/wiki/HTTP_301#Example

答案 2 :(得分:0)

“followLocation”非常有效。以下是我实施它的方法:

$url = "http://www.YOURSITE.com//"; // Assign you url here.

$ch = curl_init(); // initialize curl.
curl_setopt($ch, CURLOPT_URL, $url); // Pass the URL as the option/target.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // 0 will print html. 1 does not.
curl_setopt($ch, CURLOPT_HEADER, 0); // Please curl, inlude the header in the output.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // ..and yes, follow what the server sends as part of the HTTP header.

$response_data = curl_exec($ch); // execute curl with the target URL.
$http_header = curl_getinfo($ch); // Gets information about the last transfer i.e. our URL
// Print the URLs that are not returning 200 Found.
if($http_header['http_code'] != "200") {
    echo " <b> PAGE NOT FOUND => </b>"; print $http_header['http_code'];
}
// print $http_header['url']; // Print the URL sent back in the header. This will print the page to wich you were redirected.
print $url; // this will print the original URLs that you are trying to access

curl_close($ch); // we are done with curl; so let's close it.