如何提高neo4j基础数据库的性能

时间:2017-11-30 09:58:59

标签: performance database-design graph neo4j

我们使用以下查询创建了neo4j数据库。我们的csv文件包含50k行。

// Query1
CREATE CONSTRAINT ON (p:PR) ASSERT p.prId IS UNIQUE;

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///2015_PRData.csv' AS line WITH line,
SPLIT(SPLIT(line.`Open Date`, ' ')[0], '/') AS opnDateList,
SPLIT(SPLIT(line.`Closed Date`, ' ')[0], '/') AS clsDateList
MERGE (prNode:PR{prId:TOINT(line.prId)})
MERGE (app:Application{appName:line.Application})
MERGE (func:Function{funName:line.Function})
MERGE (subfunc:SubFunction{subFunName:line.Subfunction})
MERGE (cat:Category{catName:line.Category})
MERGE (rel:Release{relName:line.Release})
MERGE (custNode:Customer{customerName:line.`Server Name`})
MERGE (prOpenDate:PROpenDate{openDate:SPLIT(line.`Open Date`, ' ')[0]})
SET prOpenDate.day = TOINT(opnDateList[1]),prOpenDate.month = TOINT(opnDateList[0]),prOpenDate.year = opnDateList[2]
MERGE (prClosedDate:PRClosedDate{closedDate:SPLIT(line.`Closed Date`, ' ')[0]})
SET prClosedDate.day = TOINT(clsDateList[1]),prClosedDate.month = TOINT(clsDateList[0]),prClosedDate.year = clsDateList[2]
MERGE (app)-[:PART_OF_APPLN]->(func)
MERGE (func)-[:PART_OF_FUNCTION]->(subfunc)
MERGE (subfunc)-[:PART_OF_SUBFUNCTION]->(cat)
MERGE (prNode)-[:CATEGORY]->(cat)
MERGE (prNode)-[:REPORTED_BY]->(custNode)
MERGE (prNode)-[:OPEN_ON]->(prOpenDate)
MERGE (prNode)-[:CLOSED_ON]->(prClosedDate)
MERGE (prNode)-[:REPORTED_IN]->(rel)

Query 2:
//change year for open date nodes
MERGE (q:PROpenDate) SET q.year=SPLIT(q.year,' ')[0] return q;

Query 3:
//change year for closed date nodes
MERGE (q:PRClosedDate) SET q.year=SPLIT(q.year,' ')[0] return q;

Query 4:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///2015_PR_WithCP.csv' AS line WITH line
MERGE (cpNode:CP{cpId:line.cpId})
MERGE (prnode:PR{prId:TOINT(SPLIT(line.prRefId, 'PR')[1])})
CREATE (prnode)-[:FIXED_BY]->(cpNode)

Query 5:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///2015_CPWithFilename.csv' AS line WITH line
MERGE (cpNode:CP{cpId:line.cpId})
MERGE (cpFile:FILE{fileName:line.fileName})
CREATE (cpNode)-[:CONTAINS]->(cpFile)

Query 6:   
USING PERIODIC COMMIT 100
LOAD CSV WITH HEADERS FROM
'file:///2015_CPcomments.csv' AS line
MERGE (cpNode:CP{cpId:line.cpId})
MERGE (fileNode:FILE{fileName:line.fileName})
MERGE (owner:DougUser{userId:line.cpOwner})
MERGE (reviewer:DougUser{userId:line.cpReviewer})
MERGE (cpNode)-[:SUBMITTED_BY]->(owner)
WITH line WHERE line.reviewComment IS NOT NULL
MERGE (comment:ReviewComment{commentText:line.reviewComment,contextCP:line.cpId})
MERGE (comment)-[:GIVEN_BY]->(reviewer)
MERGE (comment)-[:COMMENT_FOR]->(fileNode)

在neo4j中上传数据需要更多时间。首次查询超过7小时。

从服务器获取数据也需要更多时间。

MATCH (pr:PR)-[:FIXED_BY]-(cp) 
MATCH (cp)-[:CONTAINS]-(file)  
MATCH (pr)-[:CLOSED_ON]-(cls) 
MATCH (pr)-[:REPORTED_BY]-(custs) 
MATCH (pr)-[:CATEGORY]-(cats) 
WHERE  file.fileName STARTS WITH 'xyz'  AND NOT(cls.closedDate = '' )AND 
apoc.date.parse(cls.closedDate,'s', 'MM/dd/yyyy') >= apoc.date.parse('01/01/2014','s', 'MM/dd/yyyy') AND apoc.date.parse(cls.closedDate,'s', 'MM/dd/yyyy') <= apoc.date.parse('06/13/2017','s', 'MM/dd/yyyy') 
RETURN collect(DISTINCT custs.customerName) AS customers, collect(DISTINCT cats.catName) AS categories

以上查询需要5分钟以上才能获取数据。请帮我解决这个问题。表现非常糟糕。

1 个答案:

答案 0 :(得分:2)

执行每个MERGE时,您的主要问题可能是缺少索引/约束。 MERGE就像一个MATCH或CREATE,如果你没有标签/属性的索引,那么db必须执行标签扫描,这意味着它必须检查每一个节点数据库中的标签并访问它们的所有属性,以查找哪些具有您想要的属性,这是昂贵的。在添加节点时,标签扫描(以及MERGE)变得越来越慢。改为使用索引。

如果您在具有多个属性的节点上使用MERGE,如果存在唯一属性(例如id属性),则MERGE仅使用该属性,然后使用ON CREATE SET来设置其余属性独特的属性。

您可以在查询之前使用EXPLAIN检查效率低下,这将生成查询计划而不执行查询。您希望确保看到NodeUniqueIndexSeek和NodeIndexSeek。如果您看到NodeByLabelScan,那么您很可能需要通过在相关标签/属性上添加索引来优化查询。

您的一些查询在使用MATCH时会使用MERGE(查询2和3,以及可能在后续查询中您知道节点已存在的某些节点)。如果您尝试在数据库中查找现有节点而不是尝试添加节点,请改用MATCH。

查询6在您的WITH子句中有一个缺陷,您还需要在WITH中包含reviewerfileNode,否则这些变量将超出范围并且不会被绑定到您之前在查询中创建的节点。

查询6在查询计划中也有一个Eager(由ownerreviewer节点的MERGing引起),这将阻止定期提交并导致查询无效运行。要解决此问题,请首先导入所有:DougUser节点(在此处使用单个变量),然后执行查询6(但对ownerreviewer使用MATCH,因为它们应存在于图表)。

对于查询7,EXPLAIN计划显示NodeByLabelScan,因此它将在所有:PR节点上运行,以查找匹配的模式。最好将:FILE标签添加到file节点,这将改变计划以NodeIndexSeekByRange开头,因此您的起始节点将是:FILE节点以&#39; xyz&#开头39; (通过快速索引查找),它将从那里找到匹配的模式。