Question

此代码适用于大多数网站，例如google，youtube，facebook等，但它不适用于某些网站，例如technorati：

<?php
$favicon="http://technorati.com/favicon.ico";
$content = file_get_contents($favicon);
file_put_contents('favicon/icon.ico', $content);  

echo "<img src=\"http://localhost/test/favicon/icon.ico\" />";

?>

//输出：

警告：file_get_contents（http://technorati.com/favicon.ico）   [function.file-get-contents]：无法打开流：HTTP请求   失败！ HTTP / 1.1 403禁止在/opt/lampp/htdocs/test/simple.php上   第3行

http://localhost/test/favicon/icon.ico

如何下载technorati的图标？

Answer 1

尝试模仿浏览器并将用户代理设置为technorati.com所需的内容：）

ini_set('user_agent', 'Name of your bot');

Answer 2

看看发布请求时会发生什么，例如使用Fiddler或Wireshark。

我的猜测是，Technorati网络服务器配置为拒绝自动请求，它可能使用抓取工具发送的用户代理检测到。

使用cURL可以更改用户代理。

我该如何下载此文件？

2 个答案: