Question

简而言之，我正在为OSX创建一个古希腊语索引程序，所以我需要从词典中收集定义。

在http://biblehub.com/greek/1.htm页面中，我需要检索“Strong's Exhaustive Concordance”下的文本。问题是HTML文件中的div包含与其他div相同的类，这使得以编程方式查找特定div很困难。

在JSOUP中，我在包含“Strong's Exhaustive Concordance”的div之后搜索了文本，但输出是“Strong's Exhaustive Concordance”而不是单词的定义。

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Document;

public class Greek {

    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://biblehub.com/greek/1.htm").get();

        Elements n = doc.select("div.vheading2:containsOwn(Strong's Exhaustive Concordance) + p");

        System.out.println(n.text());
    }
}

Answer 1

您是否知道有一个非常方便的工具可以帮助您在Chrome开发工具中找到该元素？

右键单击要查找的元素，然后右键单击 - ＆gt;检查，它将显示元素的HTML代码。右键单击元素，然后选择复制 - ＆gt;您将看到一系列选项，例如CSS Selector，XPath可供您使用:)请参见下面的屏幕截图：

所以在你的情况下，它将是： Pattern pattern = Pattern.compile("ben\\b.*'(.*?)'"); Matcher matcher = pattern.matcher(string); if (matcher.matches()) { System.out.printf(matcher.group(1)); }

Answer 2

我已经找到了解决方案。

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class Greek {
    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://biblehub.com/greek/1.htm").get();


        // contains an array of all elements with out desired ID
        Elements n = doc.select("div.vheading2");

        // cycle through the array until we find the member that contains the text above the word's definition
        for (Element e : n) {
            if (e.text().equalsIgnoreCase("Strong's Exhaustive Concordance")) {

                // finally, we print the next element, which is our definition
                System.out.println(e.nextElementSibling().text());
            }
        }
    }
}

JSOUP：在div之后获取带有特定文本的文本

2 个答案: