使用WebRequest登录后使用HtmlUnit浏览网站

时间:2017-06-25 15:46:58

标签: htmlunit

我在使用HtmlUnit的点击功能时遇到了使用表单登录的问题,因此我决定使用WebRequest登录。

网站登录的方式是表单的提交按钮是对单独URL的ajax调用。收到该POST请求的响应后,页面会自动重新加载并且您已登录。

// Client configuration
    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setCssEnabled(false);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());

// Get cookies (Not sure if necessary to log in)
    HtmlPage webPage = (HtmlPage)webClient.getPage("https://www.marinetraffic.com/");
    URL cookieURL = new URL("https://www.marinetraffic.com/");
    String cookies = webClient.getCookies(cookieURL).toString();

// Configure request headings
    URL url = new URL("https://www.marinetraffic.com/en/users/ajax_login");
    WebRequest requestSettings = new WebRequest(url, HttpMethod.POST);

    requestSettings.setAdditionalHeader(":authority", "www.marinetraffic.com");
    requestSettings.setAdditionalHeader(":method", "POST");
    requestSettings.setAdditionalHeader(":path", "/en/users/ajax_login");
    requestSettings.setAdditionalHeader(":scheme", "https");
    requestSettings.setAdditionalHeader("accept", "*/*");
    requestSettings.setAdditionalHeader("accept-encoding", "gzip,deflate,sdch");
    requestSettings.setAdditionalHeader("accept-language", "en-US,en;q=0.8");
    requestSettings.setAdditionalHeader("content-type", "application/x-www-form-urlencoded; charset=UTF-8");
    requestSettings.setAdditionalHeader("cookie", cookies);
    requestSettings.setAdditionalHeader("origin", "https://www.marinetraffic.com");
    requestSettings.setAdditionalHeader("referer", "https://www.marinetraffic.com/en/ais/home/centerx:-33.1/centery:21.4/zoom:4");
    requestSettings.setAdditionalHeader("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36");
    requestSettings.setAdditionalHeader("x-requested-with", "XMLHttpRequest");

// Request body with form information  
    requestSettings.setRequestBody("_method=POST&email=dummy%40gmail.com&password=fakepassword&is_ajax=true");


// redirectPage is of type UnexpectedPage
    Page redirectPage = webClient.getPage(requestSettings);
    webClient.waitForBackgroundJavaScript(10 * 1000);

// Console confirms login was a success
    System.out.println(redirectPage.getWebResponse().getContentAsString());
    System.out.println(webClient.getCookies(cookieURL).toString());

// When I try to navigate to the main page I am not logged in
    HtmlPage webPage2 = (HtmlPage)webClient.getPage("https://www.marinetraffic.com/");
    System.out.println(webPage2.asXml());

我还尝试使用更新的Cookie到主站点进行WebRequest GET调用,但也返回了UnexpectedPage。既然我已正确登录,我将如何获取HtmlPage来浏览网站?

1 个答案:

答案 0 :(得分:0)

尝试像普通用户一样与此页面进行互动

  • 获取页面
  • 找到登录元素
  • 点击登录元素
  • 找到输入字段
  • 在fiels中输入您的用户ID和密码
  • 找到并点击登录按钮

也许你必须在某处添加一些等待代码。