使用curl发布数据并在报废页面中提取所需数据

时间:2014-03-30 09:22:38

标签: php curl web-scraping screen-scraping

该网站为http://www.nokia.com/in-en/support/warranty-check  我将数据发布到文本框和提交按钮,就像这样

<input maxlength="15" id="imei_code" type="text"></input>
<input class="button submit" value="Submit" type="submit"></input>

当按下提交按钮时,它会在页面上显示已处理的数据,并且网址已更改为http://www.nokia.com/in-en/support/warranty-check#main

当我运行下面的代码时,它会返回空白页。

<?php
function post_to_url($url, $data) {
$fields = '';
foreach($data as $key => $value) { 
  $fields .= $key . '=' . $value . '&'; 
}
rtrim($fields, '&');

$post = curl_init();

curl_setopt($post, CURLOPT_URL, $url);
curl_setopt($post, CURLOPT_POST, count($data));
curl_setopt($post, CURLOPT_POSTFIELDS, $fields);
curl_setopt($post, CURLOPT_RETURNTRANSFER, 1);

$result = curl_exec($post);

curl_close($post);
return $result;
}

$data = array(
"pin_code" => "359746040018553",
"button submit" => "Submit"
);


$scraped_page = post_to_url("http://www.nokia.com/in-en/support/warranty-check", $data);


$scraped_data = scrape_between($scraped_page, "<p>", "</p>"); 

echo $scraped_data; 

?>

我无法让它发挥作用 我想要

元素中的所有数据,即

序列号(IMEI):359746040018553 保修:超出保修范围

2 个答案:

答案 0 :(得分:0)

首先修复双引号:

$scraped_data = scrape_between($scraped_page, "<span class=\"pin_placeholder\">", "</span>");

答案 1 :(得分:0)

试试这个:

$number = 'some-number';
$url = 'http://www.nokia.com/wal/care/warranty/?r=true&locale=en_IN&productId=&inst=&deviceId='.$number;
$post = curl_init();
curl_setopt($post, CURLOPT_URL, $url);
curl_setopt($post, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($post);
curl_close($post);
print_r(json_decode($result));

它将输出如下的json:

stdClass Object
(
    [warrantyStatus] => N    <-- this checks whether it has warranty or not!
    [errorMessage] => 
    [token] => 4551c7686...
    [productCode] => xxxxx
    [pp1] => N
    [pp2] => N
    [pp3] => Y
    [pp4] => Y
    [typeDesignator] => RM-xxx
    [isValidIMEI] => Y
    [imei] => 
)
相关问题