Question

已要求我编写一个脚本，该脚本将解析页面中的所有href，然后访问每个href并检查每个页面是否正常运行（使用CURL调用中的HTTP代码）。我有类似下面的内容：

<?php foreach($links_array as $link_array): //$links_array are a tags
                $link_array=str_replace("'", "\"", $link_array); // if href='link' instead of href="link"
                $href= get_attribute($link_array, "href");
                $resolved_address= resolve_address($href, $page_base);
                $download_href= http_get($resolved_address,$ref );
                $url=$download_href['STATUS']['url'];
                $http_code=$download_href['STATUS']['http_code'];
                $total_time=$download_href['STATUS']['total_time'];
                $message=$status_code_array[$download_href['STATUS']['http_code']];
                // $status_code_array is an array 
                //if you refer to its index using the http code it give back the human
                //readable message of the code 
                ?>
                <tr>
                <td><?php echo $url ?></td>
                <td><?php echo $http_code ?></td>
                <td><?php echo $http_code ?></td>
                <td><?php echo $total_time ?></td>
                </tr>
           <?php endforeach;?>

该脚本适用于href数量少的页面，但是如果页面具有许多href，则脚本超时。我曾尝试在php.ini中增加max_execution_time，但这似乎不是一个优雅的解决方案。我的问题是 1）生产软件如何在这种类型的情况下工作，这需要很长时间才能执行。 2）是否可以通过捕获致命的“超过60秒的最大执行时间”错误来继续进行CURL调用？ 3）如果我可以对第一个href进行curl调用，检查代码，使用HTML进行打印，然后对第二个href进行下一个curl调用，检查代码，进行打印等等，这也会更好。。我怎样才能做到这一点？

请让我无知，我正在进行Web编程三个月。

Answer 1

您可以在php.ini文件中设置max_execution_time。确保使用正确的文件，因为可能有两个文件（一个用于fpm，一个用于cli）。

您可以在此处查看文件：

Http.Utility

您还可以在脚本中设置执行时间。

php --ini

或者，您也可以在php命令中设置时间。

ini_set('max_execution_time', 300);

要回答您的其他问题：

生产软件在这种情况下如何工作

一种方法（在PHP中）将使用工作程序（RabbitMQ / AMQP）。这意味着您有一个脚本将消息“发送”到队列和n个工作程序中。这些工作者从该队列中拉出消息，直到队列为空。

https://github.com/php-amqplib/php-amqplib

我是否可以通过捕获致命的“超过60秒的最大执行时间”错误来继续进行CURL调用

是的，但是没有抛出异常。您可以通过以下方式实现它：

php -dmax_execution_time=300 script.php

Answer 2

对于指向损坏的服务器的链接，卷曲超时可能需要很长时间。使用10个断开的链接，脚本可能需要几分钟才能完成。

我建议将links_array存储在带有检查队列的某些数据库，xml或json文件中。并创建一个脚本，该脚本将检查队列中的所有链接，并将http_code响应和其他数据存储在此数据库或xml数据中。

然后，您需要一个ajax脚本，该脚本将每隔X秒查询服务器一次，以从xml文件或数据库中获取所有新的选中链接，并将这些数据放在html页面上。

您可以使用cron作业或rabbitMQ启动链接检查脚本。

Answer 3

使用 CURLOPT_TIMEOUT 。您更新的代码：

ini_set('max_execution_time', 0);

foreach($links_array as $link){
    $start       = microtime(true);
    $link        = get_attribute( str_replace( '\'', '"', $link ), 'href' );
    $url         = resolve_address( $link, $page_base );
    $http_code   = getHttpCode( $url );
    $total_time  = microtime(true) - $start;
    if($http_code != 0){
        echo '<tr>
                <td>' . $url . '</td>
                <td>' . $http_code . '</td>
                <td>' . $total_time . ' s. </td>
            </tr>';
    }
}

function getHttpCode( $url )
{
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    $output = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    return $httpcode;
}

在PHP中处理脚本超时

3 个答案: