如何从第三方网站捕获数据?

时间:2015-05-19 21:02:59

标签: web-scraping capture

例如,我只想捕获此URL上显示的滚动信息的30个最新事件的数据:

http://hazmat.globalincidentmap.com/home.php#

知道如何捕捉它吗?

1 个答案:

答案 0 :(得分:0)

您使用的是哪种语言?在Java中,您可以使用以下内容获取页面HTML内容:

URL url;
InputStream is = null;
BufferedReader br;
String line;

try {
    url = new URL("http://hazmat.globalincidentmap.com/home.php");
    is = url.openStream();  // throws an IOException
    br = new BufferedReader(new InputStreamReader(is));

    while ((line = br.readLine()) != null) {
        // Here you need to parse the HTML lines until 
        //you find something you want, like for example
        // "eventdetail.php?ID", and then read the content of
        // the <td> tag or whatever you want to do.

    }
} catch (MalformedURLException mue) {
     mue.printStackTrace();
} catch (IOException ioe) {
     ioe.printStackTrace();
} finally {
    try {
        if (is != null) is.close();
    } catch (IOException ioe) {

    }
}

PHP中的示例:

$c = curl_init('http://hazmat.globalincidentmap.com/home.php');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

然后解析$html变量的内容。