在同一个mongodb查询中选择按计数分组和不同计数

时间:2014-07-15 14:46:22

标签: mongodb mongodb-query aggregation-framework

我正在尝试做类似

的事情
select campaign_id,campaign_name,count(subscriber_id),count(distinct subscriber_id)
group by campaign_id,campaign_name from campaigns;

此查询给出除count(distinct subscriber_id)

以外的结果
db.campaigns.aggregate([
    {$match: {subscriber_id: {$ne: null}}},
    {$group: { 
        _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"},
        count: {$sum: 1}
    }}
])

以下查询给出除count(subscriber_id)

以外的结果
db.campaigns_logs.aggregate([
    {$match : {subscriber_id: {$ne: null}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name",subscriber_id: "$subscriber_id"}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"}, 
                count: {$sum: 1}
              }}
])

但我想在同一个结果中使用count(subscriber_id),count(distinct subscriber_id)

2 个答案:

答案 0 :(得分:56)

当你走向正确的方向时,你开始在这里思考正确的方向。改变你的思维方式," distinct"实际上只是用两种语言编写$group操作的另一种方式。这意味着您在这里发生两个组操作,并且在聚合管道术语中,有两个管道阶段。

只需使用简化的文档进行可视化:

{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "456"
}

对于给定的"广告系列"总计数和" distinct"伯爵是" 3"和" 2"分别。所以合乎逻辑的做法是" group"所有这些" subscriber_id"首先是值并保持每个的出现次数,然后在思考"管道","总计"每个"广告系列"然后只计算" distinct"作为单独的数字出现:

db.campaigns.aggregate([
    { "$match": { "subscriber_id": { "$ne": null }}},

    // Count all occurrences
    { "$group": {
        "_id": {
            "campaign_id": "$campaign_id",
            "campaign_name": "$campaign_name",
            "subscriber_id": "$subscriber_id"
        },
        "count": { "$sum": 1 }
    }},

    // Sum all occurrences and count distinct
    { "$group": {
        "_id": {
            "campaign_id": "$_id.campaign_id",
            "campaign_name": "$_id.campaign_name"
        },
        "totalCount": { "$sum": "$count" },
        "distinctCount": { "$sum": 1 }
    }}
])

在第一个"组"之后输出文档可以这样显示:

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "456"
    }, 
    "count" : 1 
}
{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "123"
    }, 
    "count" : 2
}

所以来自"三"样本中的文件," 2"属于一个不同的价值和" 1"到另一个。这仍然可以与$sum合计,以获得您在下一阶段中执行的总匹配文档,最终结果为:

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A"
    },
    "totalCount" : 3,
    "distinctCount" : 2
}

聚合管道的一个非常好的比喻是unix管道" |"运算符,允许"链接"操作,以便您可以将一个命令的输出传递给下一个命令的输入,依此类推。开始以这种方式考虑您的处理要求将有助于您更好地理解聚合管道的操作。

答案 1 :(得分:6)

SQL查询:(分组和不同的计数)

select city,count(distinct(emailId)) from TransactionDetails group by city;

等效的mongo查询如下所示:

db.TransactionDetails.aggregate([ 
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} } 
]);