Mongo聚合查询返回的重复数据删除结果

时间:2016-10-28 15:45:58

标签: mongodb mongoose mongodb-query aggregation-framework mongodb-aggregation

一些背景:

这涉及3个集合:

  1. 帖子
  2. postsubcategories
  3. postsupercategories

  4. 帖子中的文件示例:

    {
        "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"),
        "__v" : 6,
        "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"),
        "postSubCategories" : [ 
            ObjectId("5806344baa0bbf284a2316e4")//reference to document in postsubcategories collection
        ],
        "postSuperCategories" : [ 
            ObjectId("580679958a5f5f448ba5aae9"), 
            ObjectId("580679958a5f5f448ba5aaf2")//references to documents in postsupercategories collection
        ],
        "publishedDate" : ISODate("2016-10-10T04:00:00.000Z"),
        "state" : "published",
        "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"),
        "title" : "My title",
        "topics" : []}
    

    我的查询是

    db.posts.aggregate([
    {'$unwind': 
        {'path':"$postSubCategories"}
    },
    {'$lookup': {
      'from':"postsubcategories",
      'localField': "postSubCategories",
      'foreignField': "_id",
      'as': "subObject"
    }},
    {'$unwind': 
        {'path':"$postSuperCategories"}
    },
    {'$lookup': {
      'from':"postsupercategories",
      'localField': "postSuperCategories",
      'foreignField': "_id",
      'as': "superObject"
    }},
    {'$match': {
        '$or':
            [{ "subObject.searchKeywords": "home monitor" }, 
            { "superObject.searchKeywords": "home monitor" }]
        }
    },
    {'$match': {
        "state": "published"
    }}
    


    postsubcategories和postsupercategories集合都包含一个名为searchKeywords的字段,该字段是其文档中的文本数组。我希望能够查询这些searchKeywords字段并返回匹配的帖子文档。我需要一组重复的返回_id。

    该查询返回四个文档。示例:

    ObjectId("57fbf3ce7ccbc906ed87cef6")
    ObjectId("57fbf3ce7ccbc906ed87cef6")
    ObjectId("57fbf40b7ccbc906ed87cef7")
    ObjectId("57fbf40b7ccbc906ed87cef7") 
    


    我理解它返回的原因4.一个文档包含postSubCategories对象5806344baa0bbf284a2316e4和postSuperCategories id 580679958a5f5f448ba5aae9

    第二个文档包含postSubCategories对象5806344baa0bbf284a2316e4和postSuperCategories 580679958a5f5f448ba5aaf2。对于第二个帖子重复这一点

    有没有办法可以根据返回的_id“重复数据删除”?

    我的最终结果是:

    ObjectId("57fbf3ce7ccbc906ed87cef6")
    ObjectId("57fbf40b7ccbc906ed87cef7")
    

    我知道技术上4列表中的2个匹配_id不完全相同,因为它们各自包含一个不同的postSuperCategories对象,但此时我不再关心那个字段了,只需要一个单独的帖子文档因为我需要访问其他字段。

    非常感谢任何帮助。我尝试过调查$group$addToSet$setUnion,到目前为止一直没有成功。

1 个答案:

答案 0 :(得分:1)

您可以添加一个$group检索不同的_id,其中包含您要为每个_id提取的每个属性找到的第一个值。

用于$group聚合:

{
    '$group': {
        _id: '$_id',
        item: { $first: "$$ROOT" } 
    }
}

将为您提供item字段中root document的第一项:

{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "items" : { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-12-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef4"), "title" : "My title2", "topics" : [ "a", "b" ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "items" : { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }

否则,为了在响应中选择一个字段:

{
    '$group': {
        _id: '$_id',
        author: {
            $first: '$author'
        },
        publishedDate: {
            $first: '$publishedDate'
        },
        state: {
            $first: '$state'
        },
        templateName: {
            $first: '$templateName'
        },
        title: {
            $first: '$title'
        },
        topics: {
            $first: '$topics'
        }
    }
}

你会得到类似的东西:

{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }