Question

我正在尝试使用以下代码从网站上删除一些产品详细信息：

$list_url = "http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799";
$html = file_get_contents($list_url);
echo $html;

但是，我收到了这个错误：

警告：的file_get_contents（http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799） [function.file-get-contents]：无法打开流：HTTP请求失败！ HTTP / 1.0 403禁止进入 /homepages/19/d361310357/htdocs/shopaholic/rss/topshop_f_uk.php on 第123行

我认为这是网站的某种阻止，以防止刮擦。有没有解决方法 - 可能使用cURL并设置用户代理？

如果没有，是否有其他方式获取项目名称和价格等基本产品数据？

修改

我的代码的上下文是我最终仍然希望能够实现以下目标：

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

Answer 1

我已设法通过添加以下代码来修复它...

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');

...按照this answer。

Answer 2

您应该使用cURL，而不是使用file_get_contents（）的简单方法使用cURL并设置正确的http标头以模仿正确的http请求（真实请求）。

P.S。：设置cURL以遵循重定向。以下是cURL

的链接

使用file_get_contents（）时出错403

2 个答案: