使用apoc.periodic.commit或其他方法批量处理大型串行查询

时间:2018-05-04 17:06:33

标签: neo4j neo4j-apoc

我已被建议使用apoc.periodic.commit批量处理我在neo4j中运行的大型查询。我的代码似乎并没有在每一步之后进行批处理和提交。服务器内存不足,我认为如果在每个项目后提交,则不应该这样做。

我正在计算一组节点的jaccard索引(这里我将属性paradig命名为"范式关系"因为这是一组文本中的下一个单词关系)。

为每个节点计算这个是一项相当大的工作。我计算了53个节点,但总人口约为60k,这是一个n ^ 2操作。如果我在一个事务中运行它,我的内存不足。所以我想分批运行它,在计算每个索引后提交。我已经使用属性toProcess标记了我需要处理的节点,并且我运行下面的代码来计算jaccard索引

1)我只是使用apoc错了吗?

2)是否有更好的,更以neo4j为中心的方式。我一直使用SQL。

call apoc.periodic.commit("
MATCH (s:Word{toProcess: True})
MATCH (w:Word)-[:NEXT_WORD]->(s)
WITH collect(DISTINCT w.name) as left1, s
MATCH (w:Word)<-[:NEXT_WORD]-(s)
WITH left1, s, collect(DISTINCT w.name) as right1
// Match every other word
MATCH (o:Word) WHERE NOT s = o
WITH left1, right1, s, o
// Get other right, other left1
MATCH (w:Word)-[:NEXT_WORD]->(o)
WITH collect(DISTINCT w.name) as left1_o, s, o, right1, left1
MATCH (w:Word)<-[:NEXT_WORD]-(o)
WITH left1_o, s, o, right1, left1, collect(DISTINCT w.name) as right1_o
// compute right1 union, intersect
WITH FILTER(x IN right1 WHERE x IN right1_o) as r1_intersect,
  (right1 + right1_o) AS r1_union, s, o, right1, left1, right1_o, left1_o
// compute left1 union, intersect
WITH FILTER(x IN left1 WHERE x IN left1_o) as l1_intersect,
  (left1 + left1_o) AS l1_union, r1_intersect, r1_union, s, o
WITH DISTINCT r1_union as r1_union, l1_union as l1_union, r1_intersect, l1_intersect, s, o
WITH 1.0*size(r1_intersect) / size(r1_union) as r1_jaccard,
  1.0*size(l1_intersect) / size(l1_union) as l1_jaccard,
  s, o
WITH s, o, r1_jaccard, l1_jaccard, r1_jaccard + l1_jaccard as sim
MERGE (s)-[r:RELATED_TO]->(o) SET r.paradig = sim
set s.toProcess = false
",{batchSize:1, parallel:false})

原理:

batchSize:1:我希望在设置每个jaccard索引后提交

parallel:false:我想要连续操作,所以我不会耗尽内存

1 个答案:

答案 0 :(得分:0)

我使用apoc.periodic.iterate而不是apoc.periodic.commit如下工作

虽然这解决了我自己的问题,但我还是没有把它标记为答案,因为我认为对于其他人来说,对于那些知道他们和#的人有更明确的答案是有用的。 39;重新做 我发现很难找到最好的做法来在neo4j中对这样的更新进行批处理,我自己也不足以知道这是否是最好的,(甚至是中途不太好的)练习

call apoc.periodic.iterate("

MATCH (s:Word) where s.toProcess=true
return s", 
"MATCH (w:Word)-[:NEXT_WORD]->(s)
WITH collect(DISTINCT w.name) as left1, s
MATCH (w:Word)<-[:NEXT_WORD]-(s)
WITH left1, s, collect(DISTINCT w.name) as right1
// Match every other word
MATCH (o:Word) WHERE NOT s = o
WITH left1, right1, s, o
// Get other right, other left1
MATCH (w:Word)-[:NEXT_WORD]->(o)
WITH collect(DISTINCT w.name) as left1_o, s, o, right1, left1
MATCH (w:Word)<-[:NEXT_WORD]-(o)
WITH left1_o, s, o, right1, left1, collect(DISTINCT w.name) as right1_o
// compute right1 union, intersect
WITH FILTER(x IN right1 WHERE x IN right1_o) as r1_intersect,
  (right1 + right1_o) AS r1_union, s, o, right1, left1, right1_o, left1_o
// compute left1 union, intersect
WITH FILTER(x IN left1 WHERE x IN left1_o) as l1_intersect,
  (left1 + left1_o) AS l1_union, r1_intersect, r1_union, s, o
WITH DISTINCT r1_union as r1_union, l1_union as l1_union, r1_intersect, l1_intersect, s, o
WITH 1.0*size(r1_intersect) / size(r1_union) as r1_jaccard,
  1.0*size(l1_intersect) / size(l1_union) as l1_jaccard,
  s, o
WITH s, o, r1_jaccard, l1_jaccard, r1_jaccard + l1_jaccard as sim
MERGE (s)-[r:RELATED_TO]->(o) SET r.paradig = sim
set s.toProcess = false",
{batchSize:1})
yield batches, total return batches, total