在Neo4j中缓慢柏林sparql基准查询

时间:2014-06-21 23:19:57

标签: neo4j cypher graph-databases

我正在neo4j中尝试柏林基准SPARQL查询。我使用http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html

从三元组创建了Neo4j图

总结数据加载,我的图表具有以下结构,

Subject   => Node
Predicate => Relationship
Object    => Node 

如果谓词是date,string,integer(primitive),则创建属性而不是关系并存储在Node中。

现在,我正在尝试进行Noe4j中非常慢的查询,

Query 4: Feature with the highest ratio between price with that feature and price without that feature. 

    corresponding SPARQL query for this, 

            prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
            prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
            prefix xsd: <http://www.w3.org/2001/XMLSchema#>

            Select ?feature ((?sumF*(?countTotal-?countF))/(?countF*(?sumTotal-?sumF)) As ?priceRatio)
            {
              { Select (count(?price) As ?countTotal) (sum(xsd:float(str(?price))) As ?sumTotal)
                {
                  ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> .
                  ?offer bsbm:product ?product ;
                         bsbm:price ?price .
                }
              }
              { Select ?feature (count(?price2) As ?countF) (sum(xsd:float(str(?price2))) As ?sumF)
                {
                  ?product2 a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> ;
                           bsbm:productFeature ?feature .
                  ?offer2 bsbm:product ?product2 ;
                         bsbm:price ?price2 .
                }
                Group By ?feature
              }
            }
           Order By desc(?priceRatio) ?feature
           Limit 100
 Cypher query I created for this,

    MATCH p1 = (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
    MATCH p2 = (offer2:Offer)-[r2:`product`]->products2:ProductType294)-[:`productFeature`]->features
    return (sum( DISTINCT offer2.price) * ( count( DISTINCT offer1.price) - count( DISTINCT offer2.price)) /(count(DISTINCT offer2.price)*(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price)))) AS cnt,features.__URI__ AS frui
    ORDER BY cnt DESC,frui 

此查询非常慢,请让我知道我是否以错误的方式制定查询。

Another query is Query 5: Show the most popular products of a specific product type for each country - by review count ,

      prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
      prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
      prefix rev: <http://purl.org/stuff/rev#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>

      Select ?country ?product ?nrOfReviews ?avgPrice
      {
        { Select ?country (max(?nrOfReviews) As ?maxReviews)
          {
            { Select ?country ?product (count(?review) As ?nrOfReviews)
              {
                ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
                ?review bsbm:reviewFor ?product ;
                        rev:reviewer ?reviewer .
                ?reviewer bsbm:country ?country .
              }
              Group By ?country ?product
            }
          }
          Group By ?country
        }
        { Select ?product (avg(xsd:float(str(?price))) As ?avgPrice)
          {
            ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
            ?offer bsbm:product ?product .
            ?offer bsbm:price ?price .
          }
          Group By ?product
        }
        { Select ?country ?product (count(?review) As ?nrOfReviews)
          {
            ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
            ?review bsbm:reviewFor ?product .
            ?review rev:reviewer ?reviewer .
            ?reviewer bsbm:country ?country .
          }
          Group By ?country ?product
        }
        FILTER(?nrOfReviews=?maxReviews)
      }
      Order By desc(?nrOfReviews) ?country ?product

Cypher query I created for this is following,

    MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
    with count(reviews) AS reviewcount,products2.__URI__ AS pruis, countries.__URI__ AS cntrs
    MATCH (products1:ProductType403)<-[:`product`]-(offer:Offer)
    with AVG(offer.price) AS avgPrice, MAX(reviewcount) AS maxrevs, cntrs
    MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
    with avgPrice, maxrevs,countries, count(reviews) AS rvs, countries.__URI__ AS curis, products2.__URI__ AS puris
    where maxrevs=rvs
    RETURN curis,puris,rvs,avgPrice

即使这个查询也很慢。我是否以正确的方式制定查询?

  • 我有10M三元组(柏林基准数据集)
  • 每个类型谓词都被转换为标签。
  • (对于查询4)我想要获得的是具有最高价格与
  • 之比的功能
  • 没有该功能的功能和价格。这是一个正确的方法吗? 制定查询?
  • (对于查询4)我得到了此查询的正确结果。
  • 如果我不计算总和和计数,那么查询会快速执行。

提前致谢:)可以在http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html#queries

找到SPARQL查询和信息

1 个答案:

答案 0 :(得分:0)

这些看起来像全局图查询给我? 数据集的大小是多少?

您是在两条路径之间创建笛卡尔积吗? 不应该以某种方式连接这两条路径吗?

type标签上是否应该有ProductType个属性? (:ProductType {type:"294"}) 如果你有一个索引:ProductType(type),可能是:Order(orderNo)

我真的不懂计算吗?

计数差异价格乘以报价2的不同价格之和 通过 要约2的不同价格的数量,乘以两个订单价格总和的增量?

MATCH (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH (offer2:Offer)-[r2:`product`]->(products2:ProductType294)-[:`productFeature`]->features

RETURN (sum( DISTINCT offer2.price) * 
       ( count( DISTINCT offer1.price) - count( DISTINCT offer2.price)) 
       / (count(DISTINCT offer2.price)*
       (sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price)))) 
       AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui