某些URL上的curl_exec后脚本无法继续

时间:2017-07-14 14:29:47

标签: php curl

我目前在URL列表上运行cURL,如下面的代码所示。问题是在URL的特定主机上运行它时,脚本不会超过curl_exec()行。我知道这个过程很有效,因为我已成功地在数千个URL上运行它,只有一个主机似乎导致了这个问题。出于隐私原因,我不得具体透露此URL。我的代码的删节版本如下:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_REFERER, 'www.google.com');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent['safari']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$url_result = curl_exec($ch);
echo "Will not be printed";
if(curl_error($ch)){
    echo "Still will not be printed";
}
curl_close($ch);

假设$url是一个字符串,表示要执行的URL,$useragent['safari']是表示Safari浏览器用户代理的字符串。

我已经检查了我的apache错误日志和日志文件,其中应该打印错误,并且任何日志都没有。我还手动将此URL输入到我的浏览器中,并成功导航到并加载了页面。

1 个答案:

答案 0 :(得分:0)

试试这个,似乎很相似,在这里Even CURL function can't scrape some urls发现     

class Curl
{       

public $cookieJar = "";

public function __construct($cookieJarFile = 'cookies.txt') {
    $this->cookieJar = $cookieJarFile;
}

function setup()
{


    $header = array();
    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] =  "Cache-Control: max-age=0";
    $header[] =  "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // browsers keep this blank.


    curl_setopt($this->curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7');
    curl_setopt($this->curl, CURLOPT_HTTPHEADER, $header);
    curl_setopt($this->curl,CURLOPT_COOKIEJAR, $cookieJar); 
    curl_setopt($this->curl,CURLOPT_COOKIEFILE, $cookieJar);
    curl_setopt($this->curl,CURLOPT_AUTOREFERER, true);
    curl_setopt($this->curl,CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($this->curl,CURLOPT_RETURNTRANSFER, true);  
}


function get($url)
{ 
    $this->curl = curl_init($url);
    $this->setup();

    return $this->request();
}

function getAll($reg,$str)
{
    preg_match_all($reg,$str,$matches);
    return $matches[1];
}

function postForm($url, $fields, $referer='')
{
    $this->curl = curl_init($url);
    $this->setup();
    curl_setopt($this->curl, CURLOPT_URL, $url);
    curl_setopt($this->curl, CURLOPT_POST, 1);
    curl_setopt($this->curl, CURLOPT_REFERER, $referer);
    curl_setopt($this->curl, CURLOPT_POSTFIELDS, $fields);
    return $this->request();
}

function getInfo($info)
{
    $info = ($info == 'lasturl') ? curl_getinfo($this->curl, CURLINFO_EFFECTIVE_URL) : curl_getinfo($this->curl, $info);
    return $info;
}

function request()
{
    return curl_exec($this->curl);
}
}
{
$curl = new Curl();
$html = $curl->get("http://www.thefancy.com");
echo "$html";
}



?>