Mongodb查询和计数子文档字段的出现次数

时间:2016-04-11 18:12:21

标签: mongodb aggregation-framework

我的mongoDB架构示例如下所示,Doc每1小时插入一次,插入的时间将为ServerTime

{
    "_id" : ObjectId("5709fd69c1aa400008ff66da"),
    Doc: {
        total: 245,
        sub-docs: [
            {
                accessedURL: "www.example.com",
                User:{
                    name: "John"
                }
                Time:{
                    ServerTime: "2016-03-30T15:45:41.296+0000",
                    FirstAccessTime: "2016-03-30T12:43:41.296+0800"
                    LastAccessTime: "2016-03-30T15:33:41.296+0800"
                }

            },
            {
                accessedURL: "www.123.com",
                User:{
                    name: "John"
                }
                Time:{
                    ServerTime: "2016-03-30T15:45:41.296+0000",
                    FirstAccessTime: "2016-03-30T12:40:41.296+0800"
                    LastAccessTime: "2016-03-30T15:23:41.296+0800"
                }

            },
            {
                accessedURL: "www.example.com",
                User:{
                    name: "Eric"
                }
                Time:{
                    ServerTime: "2016-03-30T15:45:41.296+0000",
                    FirstAccessTime: "2016-03-30T12:43:41.296+0800"
                    LastAccessTime: "2016-03-30T15:33:41.296+0800"
                }

            },
            ... # 245 sub-docs in the array
        ]
    }
}
... # more Docs
...

Npte ServerTime中所有sub-docs的{​​{1}} 相同

我想查询Doc文档中每个User.name访问过的网址数量,而Doc Doc基于Doc.sub-docs.ServerTime的时间范围,最终输出将是

{
    ServerTime: "2016-03-30T15:45:41.296+0000",
    sub-docs: {
        John: 3,
        Eric: 4,
        ...
    }
}
{
    ServerTime: "2016-03-30T16:45:41.296+0000",
    sub-docs: {
        John: 1,
        Eric: 2,
        ...
    }
}
...
...

如何实现?

1 个答案:

答案 0 :(得分:0)

我使用了以下测试数据,因为您的逗号有一些问题

db.test.insert({
    Doc: {
        total: 245,
        sub_docs: [
            {
                accessedURL: "www.example.com",
                User:{
                    name: "John"
                },
                Time:{
                    ServerTime: "2016-03-30T15:45:41.296+0000",
                    FirstAccessTime: "2016-03-30T12:43:41.296+0800",
                    LastAccessTime: "2016-03-30T15:33:41.296+0800"
                }
            },
            {
                accessedURL: "www.123.com",
                User:{
                    name: "John"
                },
                Time:{
                    ServerTime: "2016-04-30T15:45:41.296+0000",
                    FirstAccessTime: "2016-03-30T12:40:41.296+0800",
                    LastAccessTime: "2016-03-30T15:23:41.296+0800"
                }
            },
            {
                accessedURL: "www.example.com",
                User:{
                    name: "Eric"
                },
                Time:{
                    ServerTime: "2016-03-30T15:45:41.296+0000",
                    FirstAccessTime: "2016-03-30T12:43:41.296+0800",
                    LastAccessTime: "2016-03-30T15:33:41.296+0800"
                }
            }
        ]
    }
});

我创建了这个聚合管道。

  db.test.aggregate([
    { $unwind : "$Doc.sub_docs" },
    { $group : { "_id" : "$Doc.sub_docs.Time.ServerTime" , sub_doc : { $push : "$Doc.sub_docs.User.name" } }  },
    { $unwind : "$sub_doc" },
    { $group : { "_id" : { "time" : "$_id" ,  "user" : "$sub_doc"}  , sum : {$sum : 1}} },
    { $project : { "ServerTime":  "$_id.time",  sub_docs : { user : "$_id.user" ,  visits : "$sum" }, _id : 0 }},
    { $group : { "_id" : "$ServerTime" , sub_doc : { $push : "$sub_docs" } } }
 ]);

结果并不完全符合您的要求,但具有相同的内容

{ "_id" : "2016-04-30T15:45:41.296+0000", "sub_doc" : [ { "user" : "John", "visits" : 1 } ] }
{ "_id" : "2016-03-30T15:45:41.296+0000", "sub_doc" : [ { "user" : "Eric", "visits" : 1 }, { "user" : "John", "visits" : 1 } ] } 

您只需在第一次展开前添加$match状态,即可在与您的日期范围匹配的记录中使用此操作。 您还可以添加以下行来设置ServerTime标签

{ $project : { "ServerTime": "$_id" ,  sub_doc : 1, "_id" : 0}}

可能是一个简单的管道但是我现在拥有的,希望它有所帮助。

相关问题