Question

我使用的是org.htmlparser。我如何通过类掩码接收节点列表？例如：

<span class="selection-link normal coeff816128@Result.draw">....</span>
<span class="selection normal coefd816154@Result.draw">....</span>

我希望收到所有标有“普通”的标签。不幸的是

新的HasAttributeFilter（“class”，“normal”）

不行。 HTMLparser是否允许使用new HasAttributeFilter("class", "\*normal*")？

Answer 1

如果可能的话，你可以试试jsoup - 它是一个非常强大的开源html库。

以下是如何使用 normal 作为类来获取（和打印）每个元素的示例：

输入Html：

<span class="selection-link normal coeff816128@Result.draw">....</span>
<span class="selection-link coeff816128@Result.draw">....</span>
<span class="selection coefd816154@Result.draw">....</span>
<span class="selection normal coefd816154@Result.draw">....</span>

（这是你的，但有两个额外的span没有normal类）

<强> Jsoup：

/* Input file - containing the html listed above.*/
final File f = new File("test.html");

/*
 * Parse the html into a jsoup document. In this example i get it from
 * the file, but its possible to parse from string or connect to a
 * website.
 */
Document doc = Jsoup.parse(f, null);


/* Iterate over eacht element */
for( Element element : doc.select("*.normal") )
{
    System.out.println(element);
}

使用*.normal，您可以选择类normal的每个元素。但是，如果您只使用span标记的人使用span.normal代替。

有关Jsoup选择器api的文档，请参阅此处：http://jsoup.org/cookbook/extracting-data/selector-syntax

顺便说一句。如果您想使用DOM选择器而不是select()：doc.getElementsByClass("normal")

在HTMLparser HasAttributeFilter参数中使用通配符（或regexp）

1 个答案: