Question

我是MongoDB的新手，我需要进行聚合，这在我看来相当困难。文档看起来像这样

{ 
 "_id" : ObjectId("568192aef8bd6b0cd0f649c6"), 
 "conference" : "IEEE International Conference on Acoustics, Speech and Signal Processing", 
 "prism:aggregationType" : "Conference Proceeding", 
 "children-id" : [
    "SCOPUS_ID:84948148564", 
    "SCOPUS_ID:84927603733", 
    "SCOPUS_ID:84943521758", 
    "SCOPUS_ID:84905234683", 
    "SCOPUS_ID:84876113709"
 ], 
 "dc:identifier" : "SCOPUS_ID:84867598678"
}

该示例仅包含聚合中需要的字段。 Prism：aggregationType 可以有5个不同的值（会议进程，书籍，期刊等）。 Children-id 表示此文档被一系列其他文档引用（ SCOPUS_ID 是每个文档的唯一ID）。我想要做的是先通过会议进行分组，然后对每个会议进行分组，我想知道每个 prism：aggregationType 引用文档的数量（$ gt> 0）。

例如，假设有100个文档从上面开会。 250份文件引用了这100份文件。我想从所有这250份文件中知道有多少人有 “prism：aggregationType”：“会议进行中” ， “prism：aggregationType”： “期刊” 等输出可能如下所示：

{  
 "conference" : "IEEE International Conference on Acoustics, Speech and Signal Processing", 
 "aggregationTypes" : [{"Conference Proceeding" : 50} , {"Journal" : 200}]
}

如果使用聚合管道或map-reduce完成它并不重要。

修改

有没有办法将这些2合并为一个聚合：

db.articles.aggregate([
 { $match:{
    conference : {$ne : null}
 }},
 {$unwind:'$children-id'},
 {$group: {
   _id: {conference: '$conference'},
  'cited-by':{$push:{'dc:identifier':"$children-id"}}
 }}
 ]);
db.articles.find( { 'dc:identifier': { $in: [ 'SCOPUS_ID:84943302953', 'SCOPUS_ID:84927603733'] } }, {'prism:aggregationType':1} );

在查询中，我想将 $ in 中的数组替换为使用 $ push

创建的数组

Answer 1

请通过aggregation

试试这个

> db.collections
    .aggregate([
       // 1. get the size of `children-id` array through $project
       {$project: {
             conference: 1, 
             IEEE1: 1, 
             'prism:aggregationType': 1, 
             'children-id': {$size: '$children-id'}
        }},
        // 2. group by `conference` and `prism:aggregationType` and sum the size of `children-id` 
        {$group: {
                 _id: {
                    conference:'$conference', 
                    aggregationType: '$prism:aggregationType'
                    }, 
                 ids: {$sum: '$children-id'}
         }}, 
         // 3. group by `conference`, and make pair of the conference processing ids size and journal ids size 
         {$group: {
               _id: '$_id.conference', 
               aggregationTypes: { 
                           $cond: [{$eq: ['$_id.aggregationType', 'Conference Proceeding']}, 
                                   {$push: {"Conference Proceeding": '$ids'}}, 
                                   {$push: {"Journal": '$ids'}}
                           ]}
         }}
]);

Answer 2

我们聊天时，

遗憾的是，在聚合管道中使用 $ lookup 绑定到mongodb 3.2，这不是一个案例，因为R驱动程序可以使用mongo 2.6，源文档在多个集合中。

Answer 3

我在编辑部分编写的代码也是我提出的最终结果（稍作修改）

db.articles.aggregate([
{ $match:{
  conference : {$ne : null}
}},
{$unwind:'$children-id'},
{$group: {
  _id: '$conference',
 'cited-by':{$push:"$children-id"}
}}
]);
db.articles.find( { 'dc:identifier': { $in: [ 'SCOPUS_ID:84943302953', 'SCOPUS_ID:84927603733'] } }, {'prism:aggregationType':1} );

每个会议的结果都是这样的：

{ 
"_id" : "Annual Conference on Privacy, Security and Trust", 
"cited-by" : [
    "SCOPUS_ID:84942789431", 
    "SCOPUS_ID:84928151617", 
    "SCOPUS_ID:84939229259", 
    "SCOPUS_ID:84946407175", 
    "SCOPUS_ID:84933039513", 
    "SCOPUS_ID:84942789431", 
    "SCOPUS_ID:84942607254", 
    "SCOPUS_ID:84948165954", 
    "SCOPUS_ID:84926379258", 
    "SCOPUS_ID:84946771354", 
    "SCOPUS_ID:84944223683", 
    "SCOPUS_ID:84942789431", 
    "SCOPUS_ID:84939169499", 
    "SCOPUS_ID:84947104346", 
    "SCOPUS_ID:84948764343", 
    "SCOPUS_ID:84938075139", 
    "SCOPUS_ID:84946196118", 
    "SCOPUS_ID:84930820238", 
    "SCOPUS_ID:84947785321", 
    "SCOPUS_ID:84933496680", 
    "SCOPUS_ID:84942789431"
]
}

我遍历了我得到的所有文档（大约250个），然后我在 $ in 中使用了引用的数组。我在 dc：identifier 上使用索引，因此它可以立即生效。 $ lookup 可以替代从聚合管道完成任务，但R中的包不支持2.6以上的版本。无论如何，谢谢你的时间:)）

MongoDB聚合/ map-reduce

3 个答案: