Cypher查询中的COLLECT速度慢

时间:2016-12-16 16:57:15

标签: neo4j cypher

我正在研究一个返回"组合限制的Cypher"在两组结果中,一组是直接邻居,另一组是邻居交叉"事件节点",如下:

OPTIONAL MATCH (subject:Person {age:"38"})--(event:Event)--(targetViaEvent)   
OPTIONAL MATCH (subject)--(directTarget)  
  WHERE NOT directTarget:Event  
WITH subject, targetViaEvent, directTarget,  
  COUNT(event) AS eventCount 
  ORDER BY eventCount DESC  
WITH subject, COLLECT(directTarget) + COLLECT(targetViaEvent) as targetList  
UNWIND targetList AS target  
WITH DISTINCT subject, target 
SKIP 0 LIMIT 10
...

此Cypher查询的主要目的是:

  1. 找到所有邻居
  2. 如果邻居被标记为Event,请找到该事件的其他邻居
  3. 按事件数量对事件连接的邻居进行排序
  4. 返回上面找到的邻居,无论是否标记为Event,都使用skip和limit进行分页 4.1。如果能够,则返回标有Event标签的邻居,而不是
  5. 其他规格:

    1. 考虑所有关系类型和路线,因此不会过滤
    2. 使用COLLECT()时,执行时间变得非常慢,使得neo4j shell失速,因为每个主题可能有一万个directTargettargetViaEvent。我怀疑COLLECT()缓存内存中每个匹配的节点对象,因此在此数据范围内阻塞了Neo4j。我的目的只是将两者结合起来,并完全限制。是否有任何技巧可以改善我的Cypher?

      编辑:

      正如@InverseFalcon在上面的Cypher中指出我的错误,这里是我的整个Cypher的更新:

      PROFILE MATCH (subject:Person {age:"38"})
      OPTIONAL MATCH (subject)--(directTarget)
        WHERE NOT directTarget:Event
      OPTIONAL MATCH (subject)--(event:Event)--(targetViaEvent)
      WITH subject, targetViaEvent, directTarget,
           COUNT(event) AS eventCount ORDER BY eventCount DESC
      WITH subject, COLLECT(directTarget) + COLLECT(targetViaEvent) as targetList
      UNWIND targetList AS target
      WITH DISTINCT subject, target SKIP 0 LIMIT 300 WHERE target IS NOT NULL
      OPTIONAL MATCH (subject)-[subject_target]-(target)
      OPTIONAL MATCH (subject)--(eventPrime)--(target)
      WITH subject, subject_target, target, COLLECT(eventPrime)[0..200] AS eventList
      UNWIND (CASE eventList WHEN [] THEN [null] else eventList end) as limitedEvents
      OPTIONAL MATCH (subject)-[subject_event]-(limitedEvents)-[event_target]-(target)
      RETURN subject, subject_target, target, subject_event, limitedEvents, event_target
      

      注意:在SKIP...LIMIT...之后我重复查询只是为了识别节点之间的关系,在某种意义上 a)我想在json中建立关系结果; b)经过多次尝试,我无法设法获取前3个MATCH的关系,特别是COUNT(event)不起作用,因为每个事件与一个关系出价,以便计数不断为1.

1 个答案:

答案 0 :(得分:2)

我们可以稍微改进您的查询,因为现在您正在使用每个directTarget在笛卡尔积中为每个事件+ targetViaEvent构建行,因此您需要做大量的工作而不需要做。一个好的方法,特别是对于你想要两者聚合的背靠背MATCH或可选匹配,是在每个聚合上单独构建聚合,而不是一次尝试全部聚合。这避免了笛卡尔积。

我建议将其作为替代查询:

MATCH (subject:Person {age:"38"})
OPTIONAL MATCH (subject)--(event:Event)--(targetViaEvent)
WITH subject, COUNT(event) AS eventCount, targetViaEvent
ORDER BY eventCount DESC
WITH subject, COLLECT(targetViaEvent) as eventTargets
// Above WITH means we now have only one row per subject so far
OPTIONAL MATCH (subject)--(directTarget)
  WHERE NOT directTarget:Event
WITH subject, COLLECT(directTarget) + eventTargets as targetList
UNWIND targetList AS target
WITH DISTINCT subject, target SKIP 0 LIMIT 10
...

修改

我刚刚发现原始查询中存在问题。在你的两个OPTIONAL MATCH中,你正在分享'subject'变量。这使得你的第二个可选比赛依赖于你的第一个可选比赛中的subjects。它不会寻找那种模式:与你的第一个OPTIONAL MATCH不匹配的人。

基本上,如果第一个OPTIONAL MATCH是MATCH,则该组OPTIONAL MATCHES实际应该执行相同的。

如果你的目的是在所有人身上同时运行两个OPTIONAL MATCH,那么你可能需要将查询的第一部分更改为:

MATCH (subject:Person {age:"38"})
OPTIONAL MATCH (subject)--(event:Event)--(targetViaEvent)   
OPTIONAL MATCH (subject)--(directTarget) 
... 

这可能会影响原始查询的速度和构建的结果数量。

此外,我们的查询(在您更改之后)的结果也将返回没有目标的主题行,其中两个可选匹配与主题的任何内容都不匹配(在这些情况下,具有空目标的单个主题) )。如果在回报中不需要这些,我们都需要在最后的WITH之后添加WHERE target IS NOT NULL

相关问题