搜索关键字的Java Webcrawler

时间:2015-03-21 16:42:26

标签: java web web-crawler

我尝试让网页抓取工具在网页上找到给定的字词时返回true。返回true语句永远不会运行,因此我无法正常使用它。任何人有一个简单的方法来做到这一点?感谢

    public static boolean keywordSearch(String url, String keyword){
    String strTemp = "";
    try {
        URL my_url = new URL(url);
        BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream()));
        while(null != (strTemp = br.readLine())){
            if (strTemp.contains(keyword)){
                return true;
            }
    }
    } catch (Exception ex) {
        System.out.println("Error: " + ex.getMessage());
    }
    return false;
}

1 个答案:

答案 0 :(得分:-1)

首先使用以下方法读取URL的内容:

public static String getText(String url) throws Exception {
    URL website = new URL(url);
    URLConnection connection = website.openConnection();
    BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

    StringBuilder response = new StringBuilder();
    String inputLine;

    while ((inputLine = in.readLine()) != null) 
        response.append(inputLine);

    in.close();

    return response.toString();
}

然后检查URL的内容是否包含如下关键字:

String content = URLConnectionReader.getText("http://www.someurl.com/page.html");
if(content.contains("someKeyword"))
{
    // content of url contains keyword
}
相关问题