假设您是Twitter,并且:
(:User)
和(:Tweet)
个节点; 您可以为这些推文添加标签,例如:AwaitingModeration
,或添加isAwaitingModeration = true|false
。
一种选择本质上比另一种更好吗?
我知道最好的答案可能是尝试加载测试两者:),但Neo4j的实现POV有什么能使一个选项更健壮或适合这种查询吗?
在任何特定时刻,它是否取决于此状态下的推文数量?如果它在10s与1000s之间,那会有所作为吗?
我的印象是标签更适合大量节点,而索引属性更适合较小的卷(理想情况下,唯一节点),但我不确定这是否真的如此。
谢谢!
答案 0 :(得分:34)
更新:跟进blog post已发布。
当我们为客户建模数据集以及Active / NonActive实体的典型用例时,这是一个常见问题。
这是关于我对Neo4j2.1.6有效的一点反馈:
第1点。您在标签或索引属性上的匹配与返回节点之间的数据库访问没有区别
第2点。当此类节点位于模式的末尾时会遇到差异,例如
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;
-
PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION153 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 3 | | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == { AUTOBOOL1}) |
| SimplePatternMatcher | 1 | 12 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
Total database accesses: 17
在这种情况下,Cypher不会使用索引:Post(published)
。
因此,如果您拥有ActivePost标签,则标签的使用效果会更好。 :
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION130 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 1 | | hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher | 1 | 4 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 7
第3点。始终使用标签作为肯定,对于上述情况,具有草稿标签将强制您执行以下查询:
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;
意味着Cypher将打开每个节点标签标头并对其进行过滤。
第4点。避免在多个标签上匹配
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION139 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 2 | | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher | 1 | 8 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
Total database accesses: 12
这将导致Cypher在第3点执行相同的过程。
第5点。如果可能,请避免需要通过良好类型的命名关系匹配标签
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts
-
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION119 |
| EagerAggregation | 1 | 0 | | n |
| SimplePatternMatcher | 3 | 0 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 2
性能更高,因为它将使用图的所有功能,只需遵循节点的关系,导致不再访问数据库,而不是匹配用户节点,因此不会对标签进行过滤。
这是我的0,02€