使用Jsoup需要登录的Java scrap网站

时间:2016-04-07 23:13:57

标签: java login jsoup scrape

我想从streetinsider.com打印一些数据(div with class =“news_article”)。我创建了一个帐户,我需要登录才能访问这些数据。

任何人都可以解释为什么这段代码不起作用?我已经尝试了很多,但没有任何工作。

    public static final String SPLIT_INTERNET_URL = "http://www.streetinsider.com/Special+Dividends?offset=55";
public static final String SPLIT_LOGIN = "https://www.streetinsider.com/login.php";

/**
 * @param args the command line arguments
 * @throws java.io.FileNotFoundException
 * @throws java.io.UnsupportedEncodingException
 * @throws java.text.ParseException
 * @throws java.lang.ClassNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, IOException, ParseException, ClassNotFoundException {
    // TODO code application logic here
    Response res = Jsoup.connect(SPLIT_LOGIN)
            .data("loginemail", "XXXXX", "password", "XXXX")
            .method(Method.POST)
            .execute();
    Document doc = res.parse();

    Map<String, String> cookies = res.cookies();

    Document pageWhenAlreadyLoggedIn = Jsoup.connect(SPLIT_INTERNET_URL).cookies(cookies).get();
    Elements elems = pageWhenAlreadyLoggedIn.select("div[class=news_article]");
    for (Element elem : elems) {
        System.out.println(elem);
    }
}

1 个答案:

答案 0 :(得分:2)

您的代码没有登录网站....请尝试以下代码登录网站。

登录网站:

Connection.Response res = Jsoup.connect(SPLIT_LOGIN)
            .data("action", "account", 
                "redirect", "account_home.php?",
                "radiobutton", "old", 
                "loginemail", "XXXXX",
                "password", "XXXXX", 
                "LoginChoice", "Sign In to Secure Area")
            .method(Connection.Method.POST)
            .followRedirects(true)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
            .execute();

因此您现在已登录,但网站似乎检测您是否已登录其他浏览器或连接,请求您首先终止该连接。以下是终止连接的代码:

Connection.Response res2 = Jsoup.connect("http://www.streetinsider.com/login_duplicate.php")
            .data("ok", "End Prior Session")
            .method(Connection.Method.POST)
            .cookies(res.cookies())
            .followRedirects(true)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
            .execute();

一切顺利,现在res2将包含您帐户的主页,然后您可以继续前往您想要的任何页面。有关如何使用Jsoup登录网站的更多信息,请查看以下教程:

How to login to a website with Jsoup

相关问题