集合文档中不同子类型的聚合

时间:2013-02-20 13:09:43

标签: mongodb mapreduce aggregation-framework

给出的集合md中的

抽象文档:

{
    vals : [{
        uid : string,
        val : string|array
    }]
}

以下,给出了部分正确的聚合:

db.md.aggregate(
    { $unwind : "$vals" },
    { $match : { "vals.uid" : { $in : ["x", "y"] } } },
    {
        $group : { 
            _id : { uid : "$vals.uid" },
            vals : { $addToSet : "$vals.val" }

        }
    }
);

可能会导致以下结果:

"result" : [
    {
        "_id" : {
            "uid" : "x"
        },
        "vals" : [
            [
                "24ad52bc-c414-4349-8f3a-24fd5520428e",
                "e29dec2f-57d2-43dc-818a-1a6a9ec1cc64"
            ],
            [
                "5879b7a4-b564-433e-9a3e-49998dd60b67",
                "24ad52bc-c414-4349-8f3a-24fd5520428e"
            ]
        ]
    },
    {
        "_id" : {
            "uid" : "y"
        },
        "vals" : [
            "0da5fcaa-8d7e-428b-8a84-77c375acea2b",
            "1721cc92-c4ee-4a19-9b2f-8247aa53cfe1",
            "5ac71a9e-70bd-49d7-a596-d317b17e4491"
        ]
    }
]

由于x是在包含数组而不是字符串的文档上聚合的结果,因此结果中的val是数组数组。我在这种情况下寻找的是一个扁平的数组(就像y的结果)。

对我而言,似乎我想通过一个aggegration调用实现的目标,目前不受任何给定操作的支持,例如在每种情况下,都不能将类型转换作为输入类型来完成或展开。

是map减少我唯一的选择吗?如果不是......任何提示?

谢谢!

2 个答案:

答案 0 :(得分:3)

您可以使用聚合来执行所需的计算而无需更改架构(尽管您可能会考虑更改架构,以便更容易编写此字段的查询和聚合)。

为了便于阅读,我将管道分成了多个步骤。为了便于阅读,我还略微简化了您的文档。

示例输入:

> db.md.find().pretty()
{
    "_id" : ObjectId("512f65c6a31a92aae2a214a3"),
    "uid" : "x",
    "val" : "string"
}
{
    "_id" : ObjectId("512f65c6a31a92aae2a214a4"),
    "uid" : "x",
    "val" : "string"
}
{
    "_id" : ObjectId("512f65c6a31a92aae2a214a5"),
    "uid" : "y",
    "val" : "string2"
}
{
    "_id" : ObjectId("512f65e8a31a92aae2a214a6"),
    "uid" : "y",
    "val" : [
        "string3",
        "string4"
    ]
}
{
    "_id" : ObjectId("512f65e8a31a92aae2a214a7"),
    "uid" : "z",
    "val" : [
        "string"
    ]
}
{
    "_id" : ObjectId("512f65e8a31a92aae2a214a8"),
    "uid" : "y",
    "val" : [
        "string1",
        "string2"
    ]
}

管道阶段:

> project1 = {
    "$project" : {
        "uid" : 1,
        "val" : 1,
        "isArray" : {
            "$cond" : [
                {
                    "$eq" : [
                        "$val.0",
                        [ ]
                    ]
                },
                true,
                false
            ]
        }
    }
}
> project2 = {
    "$project" : {
        "uid" : 1,
        "valA" : {
            "$cond" : [
                "$isArray",
                "$val",
                [
                    null
                ]
            ]
        },
        "valS" : {
            "$cond" : [
                "$isArray",
                null,
                "$val"
            ]
        },
        "isArray" : 1
    }
}
> unwind = { "$unwind" : "$valA" }
> project3 = {
    "$project" : {
        "_id" : 0,
        "uid" : 1,
        "val" : {
            "$cond" : [
                "$isArray",
                "$valA",
                "$valS"
            ]
        }
    }
}

最终聚合:

> db.md.aggregate(project1, project2, unwind, project3, group)
{
    "result" : [
        {
            "_id" : "z",
            "vals" : [
                "string"
            ]
        },
        {
            "_id" : "y",
            "vals" : [
                "string1",
                "string4",
                "string3",
                "string2"
            ]
        },
        {
            "_id" : "x",
            "vals" : [
                "string"
            ]
        }
    ],
    "ok" : 1
}

答案 1 :(得分:0)

如果使用always“vals.val”字段作为数组字段修改模式(即使记录只包含一个元素),您可以按照以下方式轻松完成:

db.test_col.insert({
    vals : [
        {
            uid : "uuid1",
            val : ["value1"]
        },
        {
            uid : "uuid2",
            val : ["value2", "value3"]
        }]
    });
db.test_col.insert(
    {
        vals : [{
            uid : "uuid2",
            val : ["value4", "value5"]
        }]
    });

使用这种方法,您只需要使用两个$ unwind操作:一个展开“父”数组,第二个展开每个“vals.val”值。所以,查询

db.test_col.aggregate(
    { $unwind : "$vals" },
    { $unwind : "$vals.val" },
    {
        $group : { 
            _id : { uid : "$vals.uid" },
            vals : { $addToSet : "$vals.val" }
        }
    }
);

您可以获得预期的价值:

{
    "result" : [
        {
            "_id" : {
                "uid" : "uuid2"
            },
            "vals" : [
                "value5",
                "value4",
                "value3",
                "value2"
            ]
        },
        {
            "_id" : {
                "uid" : "uuid1"
            },
            "vals" : [
                "value1"
            ]
        }
    ],
    "ok" : 1
}

不,您不能使用当前架构执行此查询,因为当字段不是数组字段时,$ unwind会失败。