使用jsoup从HTML解析表

时间:2016-11-10 14:31:41

标签: java html jsoup

我在抓取HTML文本方面遇到了另一个问题。这是我试图从中提取的样本:

<table class="scripture">
  <tbody>
   <tr>
   <td class="verse" valign="top">
    <a name="2:1"></a><a class="vers" href="javascript:getParallel('LUK', 2, 1);" title="Klik om grondtekst en SV te zien">&nbsp;1&nbsp;</a>
   </td>
   <td class="content">
    <span class="main">En het geschiedde in die dagen dat er een gebod uitging van keizer Augustus dat heel de wereld ingeschreven moest worden.</span>
   </td>
   </tr>
  </tbody>
</table>

<table class="scripture">
  <tbody>
   <tr>
   <td class="verse" valign="top">
    <a name="2:2"></a><a class="vers" href="javascript:getParallel('LUK', 2, 2);" title="Klik om grondtekst en SV te zien">&nbsp;2&nbsp;</a>
   </td>
   <td class="content">
    <span class="main">Deze eerste inschrijving vond plaats toen Cyrenius over Syrië stadhouder was.</span>
   </td>
   </tr>
  </tbody>
</table>

这与我在link中的问题相似,但我希望得到经文和圣经内容。我如何实现这一目标?

到目前为止,这是我尝试过的:

Element table = doc.select("table[class=scripture]").first();
Log.e("BB", "passage1: " + table.ownText());

但它没有显示任何东西。任何帮助,将不胜感激。感谢。

1 个答案:

答案 0 :(得分:0)

假设您想要获取与其自身包含诗句2:2的表格相对应的范围内容,您可以使用以下内容执行此操作:

String verse = "2:2";
// The span of class main located inside the table of class scripture
// that contains a td of class verse with a link whose attribute name is the value of verse
Element p = doc.select(
    String.format("table.scripture:has(td.verse a[name=%s]) span.main", verse)
).first();
System.out.println(p.text());

<强>输出:

Deze eerste inschrijving vond plaats toen Cyrenius over Syrië stadhouder was.