How to collect DISTINCT nodes which have DISTINCT property

时间:2017-04-10 02:37:31

标签: neo4j

I have this query which returns all of the user's posts that have been commented on:

  MATCH (author:User {user_id: { user_id }})

  MATCH (post:Post)<-[:AUTHOR]-(author)
  WHERE post.createdAt < { before }

  MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)
  WHERE NOT author.user_id = commentAuthor.user_id

  WITH
    author,
    post,
    comment,
    commentAuthor,
    count(DISTINCT commentAuthor) as participantsCount,
    count(comment) as commentsCount
  ORDER BY comment.createdAt DESC

  RETURN collect(DISTINCT post {
    .*,
    author,
    commentAuthor,
    commentCreatedAt: comment.createdAt,
    participantsCount,
    commentsCount
  })[0..{ LIMIT }] as posts

This works great besides that if the same user decides to troll and comments on the same post multiple times, that post gets returned multiple times for that same user. This makes for some spammy notifications:

user1 commented on your post "what's your favorite book?"
user2 commented on your post "what's your favorite movie?"
user3 commented your post "what's your favorite show?"
user3 commented your post "what's your favorite show?"
user3 commented your post "what's your favorite show?"

^ all of user3's comments for that post get returned

If possible, I would only like to collect the distinct posts with distinct comment author, ordered by most recent.

user1 commented on your post "what's your favorite book?"
user2 commented on your post "what's your favorite movie?"
user3 commented your post "what's your favorite show?"

^ only returns user3's most recent comment

I'm basically trying to do something along the lines of:

collect (DISTINCT post { DISTINCT commentAuthor ... ])

1 个答案:

答案 0 :(得分:1)

您的commentCreatedAt地图属性可能是此处的罪魁祸首,因为每条评论都会有不同的时间戳。您可能需要获取最新评论,因此如果您使用max(comment.createdAt)(如果是数字时间戳),则应允许这些行崩溃。

我们也要纠正你的罪名。请记住,您的聚合仅对非聚合列有意义,非聚合列充当分组键。由于您的WITH中的每一行都有commentcommentAuthor,因此您的汇总将为participantsCountcommentsCount生成1(因为它们会针对每个聚合每行都有单一评论,而不是所有评论。)

您需要一些方法来自行获取评论(通过从行中删除comment)或收集或汇总评论。

这是一种方法,我们将首先汇总每条评论所需的评论信息,然后我们将收集每篇帖子的评论信息,这也将让我们汇总每篇帖子的评论和参与者总数。 / p>

然后,为了匹配您的描述中查询的输出,我们将展开作者并在其自己的条目中收集帖子以及每个commentAuthor。

MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)

  WITH
    post,
    commentAuthor,
    // since we don't have a comment per line, we can aggregate across all comments per post/commentAuthor
    max(comment.createdAt) as lastReplyAt,
    count(comment) as commentsPerCommenter
  ORDER BY lastReplyAt DESC

  WITH post, 
    // able to sum across all comments/commenters per post since we're collecting commentAuthor
    sum(commentsPerCommenter) as commentCount, 
    collect(commentAuthor {.*, lastReplyAt, 
      commentCount:commentsPerCommenter}) as commentAuthors

  WITH post,
    commentCount,
    size(commentAuthors) as participantsCount,
    commentAuthors

UNWIND commentAuthors as author

RETURN collect(post {
    .*,
    author,
    commentCount,
    participantsCount
  })[0..5] as posts

但是,如果您希望每行发布一个帖子,并在每篇帖子中汇总commentAuthor信息,则此查询可能会更适合您:

MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)

  WITH
    post,
    commentAuthor,
    max(comment.createdAt) as lastReplyAt,
    count(comment) as commentsPerCommenter
  ORDER BY lastReplyAt DESC

  WITH post, 
    sum(commentsPerCommenter) as commentCount, 
    collect(commentAuthor {.*, lastReplyAt, 
      commentCount:commentsPerCommenter}) as commentAuthors

RETURN post {.*,
    commentCount,
    participantsCount:size(commentAuthors),
    commentAuthors}
LIMIT 5

虽然最后一个查询在任何情况下都会返回5个帖子,因为每个帖子都有自己的行,而不是每个commentAuthor重复。