我试图编写Java应用程序,我的主要范围是登录网站并解析一些数据。我选择使用htmlunit和jsoup。我一开始就陷入困境。尝试在https://github.com/login页面上查找表单ID以将其放入htmlunit代码并继续登录时,但页面的源代码如下:
<form accept-charset="UTF-8" action="/session" data-form-nonce="39175dde4169cc3f2ad998cac114a63525a17f3f" method="post">
表单没有id,那么htmlunit如何识别它?
可能会发布一个代码示例。
感谢。
答案 0 :(得分:1)
github登录页面上只有一个表单,因此识别并不是真正的问题。如果您想知道如何在不使用getElementByID
的情况下选择元素,则可以使用querySelector("...")
代替:
示例代码
WebClient webClient = new WebClient(BrowserVersion.CHROME);
String url = "https://github.com/login";
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = webClient.getPage(url);
DomElement form = (DomElement) page.querySelector("form");
System.out.println(form.asXml());
webClient.close();
<强>输出强>
<form accept-charset="UTF-8" action="/session" data-form-nonce="0cd9f59e177729dbfe5a1b275514fdcc21be8c84" method="post">
<div style="margin:0;padding:0;display:inline">
<input name="utf8" type="hidden" value="✓"/>
<input name="authenticity_token" type="hidden" value="3rrjjZbyJ6n310XnDR9mXCi5pJ6OsA+HvLJ0oem8k/XHj37Sd26GXxG7IQk5tcbDnPQnE7WvIjNgU77428iajw=="/>
</div>
<div class="auth-form-header p-0">
<h1>
Sign in to GitHub
</h1>
</div>
<div id="js-flash-container">
</div>
<div class="auth-form-body mt-3">
<label for="login_field">
Username or email address
</label>
<input autocapitalize="off" autocorrect="off" autofocus="autofocus" class="form-control input-block" id="login_field" name="login" tabindex="1" type="text"/>
<label for="password">
Password
<a href="/password_reset" class="label-link">
Forgot password?
</a>
</label>
<input class="form-control form-control input-block" id="password" name="password" tabindex="2" type="password"/>
<input class="btn btn-primary btn-block" data-disable-with="Signing in…" name="commit" tabindex="3" type="submit" value="Sign in"/>
</div>
</form>