在同一SPARQL查询中查询资源的多个属性的最佳实践

时间:2013-12-07 02:27:29

标签: sparql jena

在我的数据库中,我有三倍如下:

DocumentUri -> dc.title -> title 
DocumentUri -> dc.language -> language 
DocumentUri -> dc.description -> description 
DocumentUri -> dc.creator -> AuthorUri

我希望能够搜索document标题,然后从与标题搜索匹配的所有文档中获取所有属性。

我正在尝试使用JenaSPARQL执行此操作。我做了一个查询,收到title以从具有给定标题的文档中获取Uris。这是方法,它返回uris并将它们存储在名为webDocumentListInicial的列表中:

public void searchUriByTitle() {
        RDFNode documentUriNode;

        String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
                "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
                "?document dc:title ?title." +
                "FILTER (?title = \"" + this.getTitle() + "\" ). }";

        Query query = QueryFactory.create(queryString);

        QueryExecution qe = QueryExecutionFactory.create(query, databaseModel);
        ResultSet results =  qe.execSelect();

        while( results.hasNext() ) {

           QuerySolution querySolution = results.next();
           documentUriNode = querySolution.get("document");

           WebDocument document = new WebDocument(documentUriNode.toString());
          this.webDocumentListInicial.add(document);

        }

        qe.close();  
    }

为了获取文档的创建者,我又进行了另一个查询,因为在这种情况下,来自三元组的value是另一个资源。在这里,我迭代上面方法中填充的list文档URI。

public void searchAuthorByTitle() {
    for(  WebDocument doc : this.webDocumentListInicial ) {
    RDFNode authorUriNode;

    String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
            "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?author WHERE { " +
            "?document dc:creator ?author." +
            "FILTER (?document = <" + doc.getUri() + "> ). }";

    Query query = QueryFactory.create(queryString);

    QueryExecution qe = QueryExecutionFactory.create(query, databaseModel);
    ResultSet results =  qe.execSelect();

    while( results.hasNext() ) {

       QuerySolution querySolution = results.next();
       authorUriNode = querySolution.get("author");

       WebAuthor author; 
       author = this.searchAuthorProperties(authorUriNode.toString(), new WebAuthor(authorUriNode.toString()) );

       doc.addAuthor(author);
    }
    qe.close();  
    }
}

为了获得其他文档属性,我喜欢在下面的示例中,我迭代在上面显示的第一个方法中填充的list

public void searchDescription() {

        for( WebDocument doc : this.webDocumentListInicial ) {
            String description = "";

            Resource resource = ResourceFactory.createResource(doc.getUri());
            StmtIterator descriptionStmtIt = databaseModel.listStatements(resource, DC.description,(RDFNode) null);

            while( descriptionStmtIt.hasNext() ) {
                description = descriptionStmtIt.next().getObject().toString();
            }
            doc.setDescription(description);
        } 

    }

这样我处理数据的效率不高,因为我需要为每个属性提供不同的查询。

是否可以只进行一次查询以同时获取文档URI和所有其他文档的属性?我试过一次,就像这样:

String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
                "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document ?description " +
                "?language ?author  WHERE { " +
                "?document dc:title ?title." +
                "?document dc.language ?language" +
                "?document dc.description ?description" +
                "?document dc.creator ?author" +
                "FILTER (?title = \"" + this.getTitle() + "\" ). }";

但是当我有多个与给定标题匹配的文档时,很难知道返回的属性属于每个文档。

谢谢!

1 个答案:

答案 0 :(得分:4)

构建更好的查询

听起来你做的工作比你需要的要多得多。如果你有这样的数据:

@prefix : <http://stackoverflow.com/q/20436820/1281433/>

:doc1 :title "Title1" ; :author :author1 ; :date "date-1" .
:doc2 :title "Title2" ; :author :author2 ; :date "date-2" .
:doc3 :title "Title3" ; :author :author3 ; :date "date-3" .
:doc4 :title "Title4" ; :author :author4 ; :date "date-4" .
:doc5 :title "Title5" ; :author :author5 ; :date "date-5" .

标题列表,比如"Title1" "Title4" "Title5",你想要检索每个标题的文档资源,以及相关的作者和日期,你可以使用这样的查询:

prefix : <http://stackoverflow.com/q/20436820/1281433/>

select ?document ?author ?date where {
  values ?title { "Title1" "Title4" "Title5" }

  ?document :title ?title ;
            :author ?author ;
            :date ?date .
}

您将在一个ResultSet中获得这样的结果。没有必要进行多次查询。

----------------------------------
| document | author   | date     |
==================================
| :doc1    | :author1 | "date-1" |
| :doc4    | :author4 | "date-4" |
| :doc5    | :author5 | "date-5" |
----------------------------------

构建结果地图

根据您的评论,听起来您需要从ResultSet构建其他类型的关联结构。这是一种可以构造Map<RDFNode,Map<String,RDFNode>>的方法,它将每个文档IRI带到另一个映射,该映射将每个命名的变量都带到相关的值。

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.RDFNode;

public class HashedResultsExample {

    final static String DATA =
            "@prefix : <http://stackoverflow.com/q/20436820/1281433/>\n" +
            "\n" +
            ":doc1 :title 'Title1' ; :author :author1 ; :date 'date-1' .\n" +
            ":doc2 :title 'Title2' ; :author :author2 ; :date 'date-2' .\n" +
            ":doc3 :title 'Title3' ; :author :author3 ; :date 'date-3' .\n" +
            ":doc4 :title 'Title4' ; :author :author4 ; :date 'date-4' .\n" +
            ":doc5 :title 'Title5' ; :author :author5 ; :date 'date-5' .\n" ;

    final static String QUERY = 
            "prefix : <http://stackoverflow.com/q/20436820/1281433/>\n" +
            "select ?document ?author ?date where {\n" +
            "  values ?title { \"Title1\" \"Title4\" \"Title5\" }\n" +
            "  ?document :title ?title ; :author ?author ; :date ?date .\n" +
            "}" ;

    public static void main(String[] args) throws IOException {
        final Model model = ModelFactory.createDefaultModel();
        try ( final InputStream in = new ByteArrayInputStream( DATA.getBytes() )) {
            model.read( in, null, "TTL" );
        }

        final ResultSet rs = QueryExecutionFactory.create( QUERY, model ).execSelect();
        final Map<RDFNode,Map<String,RDFNode>> map = new HashMap<>();

        while ( rs.hasNext() ) {
            final QuerySolution qs = rs.next();
            final Map<String,RDFNode> rowMap = new HashMap<>();
            for ( final Iterator<String> varNames = qs.varNames(); varNames.hasNext(); ) {
                final String varName = varNames.next();
                rowMap.put( varName, qs.get( varName ));
            }
            map.put( qs.get( "document" ), rowMap );
        }

        System.out.println( map );
    }
}

输出(因为地图在末尾打印)带有一些可读性换行符:

{http://stackoverflow.com/q/20436820/1281433/doc4=
 {author=http://stackoverflow.com/q/20436820/1281433/author4,
  document=http://stackoverflow.com/q/20436820/1281433/doc4,
  date=date-4},
 http://stackoverflow.com/q/20436820/1281433/doc1=
 {author=http://stackoverflow.com/q/20436820/1281433/author1,
  document=http://stackoverflow.com/q/20436820/1281433/doc1,
  date=date-1},
 http://stackoverflow.com/q/20436820/1281433/doc5=
 {author=http://stackoverflow.com/q/20436820/1281433/author5,
  document=http://stackoverflow.com/q/20436820/1281433/doc5,
  date=date-5}}