登录后使用cURL从网站上抓取数据?

时间:2012-11-05 22:20:03

标签: php curl

我要做的是登录网站,然后从表中获取数据,因为它们没有导出功能。到目前为止,我已经设法登录,它显示了用户主页。但是,我需要导航到不同的页面或以某种方式抓住该页面,同时仍然使用curl登录。

到目前为止我的代码:

$username="email"; 
$password="password"; 
$url="https://jiltapp.com/sessions"; 
$cookie="cookie.txt";
$url2 = "https://jiltapp.com/shops/shopname/orders";

$postdata = "email=".$username."&password=".$password; 

$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 

echo $result;  
curl_close($ch);

正如我所提到的,我可以访问主用户页面,但我需要获取$ url2变量的内容,而不是$ url。我怎么能做到这样的事情?

谢谢!

1 个答案:

答案 0 :(得分:7)

登录后,请对包含您所追踪数据的页面发出第二次请求。

对于后续的requets,您必须设置选项CURLOPT_COOKIEFILE,该选项指向与CURLOPT_COOKIEJAR相同的文件。 cURL将读取此文件中的cookie并将其与请求一起发送。

$username="email"; 
$password="password"; 
$url="https://jiltapp.com/sessions"; 
$cookie="cookie.txt";
$url2 = "https://jiltapp.com/shops/shopname/orders";

$postdata = "email=".$username."&password=".$password; 

$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);  // <-- add this line
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 

echo $result;  

// make second request

$url = 'page you want to get data from';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 0);

$data = curl_exec($ch);