如何使用Html单元和Xpath获取段落元素

时间:2014-03-16 20:55:29

标签: java html xpath htmlunit

您好我是HtmlUnit的新手,我有一个项目,我想从一边获取一些信息,直到现在一切都顺利通过名称或ID找到元素。但我无法得到以下段落元素。

<iframe id="content_ifr" frameborder="0" src="javascript:""" allowtransparency="true" title=".." style="width: 100%; height: 307px; display: block;">
<!DOCTYPE >
<html>
<head> ... </head>
<body id="tinymce" class="mceContentBody content post-type-coupon wp-editor" contenteditable="true" onload="window.parent.tinyMCE.get('content').onLoad.dispatch();" dir="ltr">
<p>------ Text from the Element i want to get ------- </p>
</body>
</html>
</iframe>

我已经尝试过了:

side.getByXPath("//html/body/p");// zero elements
side.getByXpath("//p");// 27 element but wrong.
side.getByXpath("//body");// 1 element but wrong.
side.getByXpath("//html");// 1 element but wrong.
side.getByXpath("//html/body/div[3]/div[3]/div[2]/div/div[4]/form/div/div/div/div[2]/div/div[2]/span/table/tbody/tr[2]/td/iframe"); // Zero elements found

我检查了代码中找到的所有元素:

List<?> list =gPage.getByXPath("//p");
    for(Object x:list){
        HtmlElement y=(HtmlElement) x;
        if(y.asXml().contains("Keyword")||y.asText().contains("Keyword")){
        System.out.println(y.asText());
        }

总而言之,我无法通过他的文本找到段落元素。你能帮我找到段落元素,这样我就能读/写吗?

        //Initialize WebClient
        final WebClient webClient= new WebClient(BrowserVersion.FIREFOX_24);
        webClient.getCookieManager().setCookiesEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setCssEnabled(false);
        webClient.getOptions().setUseInsecureSSL(true);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.waitForBackgroundJavaScript(10000);

        //Perform a login.
        final HtmlPage page = webClient.getPage("");        
        final HtmlForm form = page.getForms().get(1);
        final HtmlTextInput username = form.getInputByName("log");
        final HtmlPasswordInput pw = form.getInputByName("pwd");
        username.setValueAttribute("");
        pw.setValueAttribute("");
        @SuppressWarnings("unused")
        HtmlPage page2 =  (HtmlPage) form.getButtonByName("login").click();

        //Get gutscheinPage
        HtmlPage gutscheinPage= webClient.getPage("");

        //Change Content of Textfield
        HtmlPage pageFrame = (HtmlPage) gutscheinPage.getFrames().get(0).getEnclosedPage();
        HtmlElement body =pageFrame.getBody();
        HtmlParagraph p =(HtmlParagraph) body.getByXPath("//p").get(0);
        p.setTextContent(text);

完成:更改webClient默认浏览器并等待Jscript,使用getFrames,找到正文并使用现在简单的XPath为我提供我的段落元素。

我真的希望有人会发现这对他们自己的工作很有帮助。

感谢您的每一个答案。

1 个答案:

答案 0 :(得分:2)

如您所见,它位于iframe中。我想你需要先切换到框架中。

Here是您应该尝试的文档。

// untested Java code, please debug and read documentation yourself

final List<FrameWindow> window = page.getFrames();
final HtmlPage pageTwo = (HtmlPage) window.get(0).getEnclosedPage();

// then find TinyMCE's body, which should be treated as a separated HTML page