Question

我正在使用Apache Jena从Dbpedia获取大量数据并将其写入CSV文件。但是，我只能获得大约10,000个三元组而不是整个数据。我需要它来获取查询中的所有三元组。我无法确定它是否是端点超时或其他什么。我写的代码如下：

public class FetchCountriesData {

    public void getCountriesInformation() throws FileNotFoundException {
        ParameterizedSparqlString qs = new ParameterizedSparqlString("PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n "
                + "SELECT * {     ?Subject rdf:type <http://dbpedia.org/ontology/Country> .     ?Subject ?Predicate ?Object } ORDER BY ?Subject ");

        QueryExecution exec = QueryExecutionFactory.sparqlService("https://dbpedia.org/sparql", qs.asQuery());
        //exec.setTimeout(10000000);
        exec.setTimeout(10, TimeUnit.MINUTES);
        ResultSet results = exec.execSelect();
        ResultSetFormatter.outputAsCSV(new FileOutputStream(new File("C:/fakepath/CountryData.csv")), results);
        ResultSetFormatter.out(results);
    }
}

Answer 1

您几乎肯定会达到DBPedias限制之一。有关详细信息，请参阅http://wiki.dbpedia.org/OnlineAccess和http://lists.w3.org/Archives/Public/public-lod/2011Aug/0028.html

使用Apache Jena查询Dbpedia端点时防止超时

1 个答案: