Neo4j:标签与索引属性?

时间:2015-01-15 03:28:27

标签: neo4j cypher

假设您是Twitter,并且:

  • 您有(:User)(:Tweet)个节点;
  • 推文可以被标记;和
  • 您希望查询当前正在等待审核的已标记推文的列表

您可以为这些推文添加标签,例如:AwaitingModeration,或添加属性并将其编入索引,例如isAwaitingModeration = true|false

一种选择本质上比另一种更好吗?

我知道最好的答案可能是尝试加载测试两者:),但Neo4j的实现POV有什么能使一个选项更健壮或适合这种查询吗?

在任何特定时刻,它是否取决于此状态下的推文数量?如果它在10s与1000s之间,那会有所作为吗?

我的印象是标签更适合大量节点,而索引属性更适合较小的卷(理想情况下,唯一节点),但我不确定这是否真的如此。

谢谢!

1 个答案:

答案 0 :(得分:34)

更新:跟进blog post已发布。

当我们为客户建模数据集以及Active / NonActive实体的典型用例时,这是一个常见问题。

这是关于我对Neo4j2.1.6有效的一点反馈:

第1点。您在标签或索引属性上的匹配与返回节点之间的数据库访问没有区别

第2点。当此类节点位于模式的末尾时会遇到差异,例如

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;

-

PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                                      Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                                      keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                                      posts |
|      ColumnFilter(1) |    1 |      0 |                      |                                           keep columns n,   AGGREGATION153 |
|     EagerAggregation |    1 |      0 |                      |                                                                          n |
|               Filter |    1 |      3 |                      | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == {  AUTOBOOL1}) |
| SimplePatternMatcher |    1 |     12 | n, post,   UNNAMED84 |                                                                            |
|          SchemaIndex |    1 |      2 |                 n, n |                                                {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+

Total database accesses: 17

在这种情况下,Cypher不会使用索引:Post(published)

因此,如果您拥有ActivePost标签,则标签的使用效果会更好。 :

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION130 |
|     EagerAggregation |    1 |      0 |                      |                                n |
|               Filter |    1 |      1 |                      |     hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher |    1 |      4 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 7

第3点。始终使用标签作为肯定,对于上述情况,具有草稿标签将强制您执行以下查询:

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;

意味着Cypher将打开每个节点标签标头并对其进行过滤。

第4点。避免在多个标签上匹配

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                         Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                         keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                         posts |
|      ColumnFilter(1) |    1 |      0 |                      |                              keep columns n,   AGGREGATION139 |
|     EagerAggregation |    1 |      0 |                      |                                                             n |
|               Filter |    1 |      2 |                      | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher |    1 |      8 | n, post,   UNNAMED84 |                                                               |
|          SchemaIndex |    1 |      2 |                 n, n |                                   {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+

Total database accesses: 12

这将导致Cypher在第3点执行相同的过程。

第5点。如果可能,请避免需要通过良好类型的命名关系匹配标签

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts

-

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +SimplePatternMatcher
          |
          +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION119 |
|     EagerAggregation |    1 |      0 |                      |                                n |
| SimplePatternMatcher |    3 |      0 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 2

性能更高,因为它将使用图的所有功能,只需遵循节点的关系,导致不再访问数据库,而不是匹配用户节点,因此不会对标签进行过滤。

这是我的0,02€