Question

我有一个我想读的HTML模板：

<html>
   <head>
      <title>TEST</title>
   </head>
   <body>
      <h1 id="hey">Hello, World!</h1>
   </body>
</html>

我想找到标识为hey的标记，然后粘贴新内容（例如新标记）。为此，我使用DOM解析器。但我的代码会返回null：

public static void main(String[] args) {

    try {
        File file = new File("C:\\Users\\<username>\\Desktop\\template.html");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(file);
        doc.getDocumentElement().normalize();

        System.out.println(doc.getElementById("hey")); // returns null

    } catch (Exception e) {
        e.printStackTrace();
    }

}

我做错了什么？

Answer 1

您正在尝试使用Java XML API解析一段XML，这非常符合XML规范，并且对临时开发人员没有帮助。

在XML中，名为id的属性不会自动为ID类型，因此XML实现不会使用.getElementById()。要么使用另一个库（例如Jsoup），要么指示解析器将id视为ID（通过DTD），或者使用自定义代码。

Answer 2

我将您的示例修改为使用jsoup

public static void main(String[] args) {
        try {
            File file = new File("C:\\Users\\<username>\\Desktop\\template.html");
            Document doc = Jsoup.parse(file, "UTF8");          
            Element elementById = doc.getElementById("hey");
            System.out.println("hey ="+doc.getElementById("hey").ownText());
            System.out.println("hey ="+doc.getElementById("hey"));

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Java DOM解析器返回null文档

2 个答案: