请求更多网址的卷曲

时间:2014-08-08 11:54:59

标签: php http url curl

这是请求网址的curl函数。

function get_result( $nodes )
    {
        $node_count = count($nodes);

        $curl_arr = array();
        $master = curl_multi_init();

        for($i = 0; $i < $node_count; $i++)
        {
            $url = $nodes[$i];
            $curl_arr[$i] = curl_init($url);
            curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
            curl_setopt($curl_arr[$i], CURLOPT_CONNECTTIMEOUT, 180);
            curl_setopt($curl_arr[$i], CURLOPT_TIMEOUT, 180);
            curl_setopt($curl_arr[$i], CURLOPT_ENCODING, "gzip");
            curl_setopt($curl_arr[$i], CURLOPT_PROXY, '127.0.0.1:8888');
            curl_setopt($curl_arr[$i], CURLOPT_VERBOSE, true);
            curl_setopt($curl_arr[$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
            curl_setopt($curl_arr[$i], CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($curl_arr[$i], CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
            curl_multi_add_handle($master, $curl_arr[$i]);
        }

    do {
        curl_multi_exec($master,$running);
        curl_multi_select($master, 5.0);
    } while($running > 0);


    $output = "";
    for($i = 0; $i < $node_count; $i++)
    {
        $output .= curl_multi_getcontent( $curl_arr[$i]  );
    }
    return $output;

    }

    $offset = 0;

    function select_data()
    {
        global $conn_to_sql;
        global $offset;
        $select_statement = $conn_to_sql->prepare("SELECT url FROM url_list LIMIT 5 OFFSET $offset");
        $select_statement->setFetchMode(PDO::FETCH_ASSOC);
        $offset += 5;
        $select_statement->execute();
        return $select_statement->fetchAll();
    }

while( select_data() )
{
    $datas = select_data();

    foreach ( $datas as $data )
    {
        $dat = $data["url"];
        $nodes[] = $dat;
    }
    get_result( $nodes )
}
从具有get_result的循环调用

array of 5 URLs。 (网址是从包含LIMIT 5OFFSET INCREASES BY 5的表格加载的,但每次请求数量都会增加5。

首次get_result请求5个网址。

下次请求10个网址(Next 10 URLs without duplication),然后是15个网址(Next 15 URLs without duplication),这会持续20,25,30,35 ......

我如何知道请求正在增加?所有流量都进入代理(FIDDLER);

get_result每次只应请求5个网址,但这不会发生。怎么解决这个?

2 个答案:

答案 0 :(得分:0)

您的查询后执行。因此,您的查询是逐个发送1个url而不是5乘5 ...

答案 1 :(得分:0)

看起来$nodes数组收集了太多结果。

while( select_data() )
{
    $datas = select_data();
    $nodes=array();/*reset $nodes array here*/
    foreach ( $datas as $data )
    {
        $dat = $data["url"];
        $nodes[] = $dat;
    }
    get_result( $nodes )
}