Question

我正在开展一个项目，该项目包括从网上收集工作机会。因此，作为第一步，我想从特定网页中提取数据（工作要约数据）。所以我想知道是否有API或现有代码可以帮助我。

Answer 1

我找到的最好的项目是jsoup（http://jsoup.org/）

Answer 2

例如，您可以使用此请求：

    import org.apache.http.HttpEntity;
    import org.apache.http.HttpResponse;
    import org.apache.http.NameValuePair;
    import org.apache.http.client.ClientProtocolException;
    import org.apache.http.client.HttpClient;
    import org.apache.http.client.entity.UrlEncodedFormEntity;
    import org.apache.http.client.methods.HttpGet;
    import org.apache.http.client.methods.HttpPost;
    import org.apache.http.impl.client.HttpClientBuilder;
    import org.apache.http.message.BasicNameValuePair;
    import org.apache.http.protocol.HTTP;
    import org.apache.http.util.EntityUtils;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;

    public class ... {

        Document doc;

        HttpClient client = HttpClientBuilder.create().build();
        HttpGet requestGet = new HttpGet(url + params);
        HttpResponse response = client.execute(requestGet);
        HttpEntity entity = response.getEntity();
        String responseString = EntityUtils.toString(entity, "UTF-8");

        /*
         * Here you can retrive the information with Jsoup library
         * in thi example extract data from a table element
         */
        doc = Jsoup.parse(response);
        Element elementsByTag = doc.getElementsByTag("table").get(1);

        Elements rows = elementsByTag.getElementsByTag("tr");
        for (Element row : rows) {
         \\TODO
         }
}

使用java从网页中提取数据

2 个答案: