spark-scala-mongo-aggregate:查询多个字段,按2个字段分组

时间:2017-01-05 23:09:09

标签: mongodb scala apache-spark intellij-idea aggregation-framework

我正在寻找mongo聚合代码示例,该示例通过几个字段查询集合和组中的多个字段。我的收藏:

events:
{
_id
prodId:
location:
status:
user:
date:
}

以上系列非常扁平。我正在寻找如下结果:

For status "Completed" (This is a $match condition)

    {Product: abc
         {Location: US
            {user, date}
            {user, date
            {user, date}
             .......}
         {Location: APAC
            {user, date}
            {user, date
            {user, date}
             .......}}
    {Product: XYZ
         {Location: US
            {user, date}
            {user, date
            {user, date}
             .......}
         {Location: APAC
            {user, date}
            {user, date
            {user, date}
             .......}}
  ........

我们如何使用嵌套的$group$match或任何其他聚合阶段在聚合框架中编写此代码。

非常感谢任何建议或帮助。感谢。

2 个答案:

答案 0 :(得分:1)

使用包含多个字段的组:

db.collection.aggregate([{$group: {attr1:'$attr1', attr2:'$attr2'}}])

答案 1 :(得分:1)

经过大量的追踪和错误,我能够在一定程度上解决这个问题。虽然,这不是我想要的,但这更好。这就是我得到的。

{
        "_id" : {
                "Product" : "ABC",
                "location" : "ERU"
        },
        "details" : [
                {   //Each of this is a unique combination
                        "user" : "XXXX",
                        "date" : ISODate("2015-08-01T09:08:15Z")
                },
                {
                        "user" : "xxxx",
                        "date" : ISODate("2015-08-01T09:03:08Z")
                },
                {
                        "user" : "xxxx",
                        "date" : ISODate("2015-07-20T19:33:57Z")
                },
                {
                        "user" : "xxxx",
                        "date" : ISODate("2015-07-20T19:28:50Z")
                }
        ],
        "count" : 4
}
{
        "_id" : {
                "Product" : "AAA",
                "location" : "US"
        },
        "details" : [
                {
                        "user" : "XXXX",
                        "date" : ISODate("2015-08-01T09:08:15Z")
                },
                {
                        "user" : "xxxx",
                        "date" : ISODate("2015-08-01T09:03:08Z")
                },
                {
                        "user" : "xxxx",
                        "date" : ISODate("2015-07-20T19:33:57Z")
                },
                {
                        "user" : "xxxx",
                        "date" : ISODate("2015-07-20T19:28:50Z")
                }
        ],
        "count" : 4
}

我的汇总代码:

db.events.aggregate([
 {$project: 
    {
        ProdId:1,
        location:1,
        username:1,
        status:1,
        dateTime:1
    }
    }
, {$group: 
    {
        _id: {Product: "$prodId", location: "$location"},
        details: {$addToSet: {user: "$username", date: "$dateTime"}},
        count: {$sum: 1}
    }}
],{allowDiskUse: true}
)

希望这会有所帮助。感谢。