简单HTML Dom不请求亚马逊页面

时间:2019-05-01 16:10:56

标签: php dom

嗨,我正在尝试爬到亚马逊产品的价格,但是当我通过html dom请求该页面时,它显示空白页面,但是如果我放置aliexpress的链接就可以了

例如:

$value = "https://www.amazon.com/Apple-iPhone-Plus-Unlocked-32GB/dp/B01N6ZAR0D/"
$html = file_get_html($value);

echo $html;

1 个答案:

答案 0 :(得分:0)

不建议直接通过html dom客户端进行请求。特别是当您在像亚马逊这样的大型网站上工作时。这些网站(例如Amazon)检查客户的用户代理,cookie和标头信息以验证安全性并检查是否为漫游器。

所以

您应该使用curl或guzzle来请求网页,并提供必要的请求标头。请求后,返回响应字符串并通过str_get_html对其进行解析。

示例:

$response = $client->request($url);
$html = str_get_html($response);

您的问题的实际工作示例: 单击此link通过github获取代码

require __DIR__ . '/vendor/autoload.php';
require 'simple_html_dom.php';
use Curl\Curl;

// initialize curl
// you can install via "composer require php-curl-class/php-curl-class"
$curl = new Curl();

// set cookies
$curl->setCookieFile(__DIR__ . '/cookies.txt');
$curl->setCookieJar(__DIR__ . '/cookies.txt');

// decode gzip encoded because amazon is using gzip
$curl->setOpt(CURLOPT_ENCODING , "gzip");

// set request header like a browser
$curl->setHeaders([
    'accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
    'accept-encoding' => 'gzip, deflate, br',
    'accept-language' => 'en,tr;q=0.9',
    'user-agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
]);

// request
$curl->get('https://www.amazon.com/Apple-iPhone-Plus-Unlocked-32GB/dp/B01N6ZAR0D/');

// get raw response
$response = $curl->getRawResponse();

// parser
$html = new simple_html_dom();

// load from string html
$html->load($response);

// find price and print
$price = $html->find('#price', 0)->plaintext;
echo $price;