Question

我有一个类似于此结构的集合。

{
    "_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
    "dateJoined": ISODate("2014-12-28T16:37:17.984Z"),
    "activatedMonth": 5,
    "enrollments" : [
        { "month":-10, "enrolled":'00'},
        { "month":-9, "enrolled":'00'},
        { "month":-8, "enrolled":'01'},
        //other months
        { "month":8, "enrolled":'11'},
        { "month":9, "enrolled":'11'},
        { "month":10, "enrolled":'00'}
    ]
}

月份的注册子文档是dateJoined的相对月份。

activatedMonth是相对于dateJoined的激活月份。因此，每个文档都会有所不同。

我正在使用Mongodb聚合框架来处理查询，例如“查找从 ~~dateJoined~~ 激活前10个月注册的所有文档到 ~~dateJoined~~ 激活后25个月”。

“已注册”值01,10,11被视为已注册，00被视为未注册。对于要考虑注册的文档，应该在该范围内的每个月注册。

我正在应用我可以在匹配阶段应用的所有过滤器，但在大多数情况下这可能是空的。在投影阶段，我试图找出至少有一个未登记的月份的所有文件。如果大小为零，则注册文档。

以下是我正在使用的查询。完成需要3到4秒。与群体阶段或多或少相同时间。我的数据相对较小（0.9GB），文档总数为41K，子文档数量约为。 1300万。

我需要减少处理时间。我尝试在enrollments.month和enrollment.enrolled上创建一个索引并没有用，我认为这是因为项目阶段无法使用索引。我是对的吗？

我可以对查询或集合结构做些什么来提高性能吗？

let startMonth = -10;
let endMonth = 25;

mongoose.connection.db.collection("collection").aggregate([
  {
    $match: filters
  },
  {
    $project: {
      _id: 0,
      enrollments: {
        $size: {
          $filter: {
            input: "$enrollment",
            as: "enrollment",
            cond: {
              $and: [
                {
                  $gte: [
                    '$$enrollment.month',
                    {
                       $add: [
                         startMonth,
                         "$activatedMonth"
                       ]
                     }
                  ]
                },
                {
                  $lte: [
                    '$$enrollment.month',
                    {
                       $add: [
                         startMonth,
                         "$activatedMonth"
                       ]
                     }

                  ]
                },
                {
                  $eq: [
                    '$$enrollment.enroll',
                    '00'
                  ]
                }
              ]
            }
          }
        }
      }
    }
  },
  {
    $match: {
      enrollments: {
        $eq: 0
      }
    }
  },
  {
    $group: {
      _id: null,
      enrolled: {
        $sum: 1
      }
    }
  }
]).toArray(function(err,
result){
  //some calculations
}
});

另外，我肯定需要小组赛阶段，因为我会根据不同的领域对计数进行分组。为简单起见，我省略了这一点。

编辑：

我错过了最初帖子中的关键细节。使用实际用例更新了问题，为什么我需要计算投影。

编辑2：我将此转换为计数查询以查看其执行情况（根据Niel Lunn对此问题的评论。

我的查询：

mongoose.connection.db.collection("collection") .find({ "enrollment": { "$not": { "$elemMatch": { "month": { "$gte": startMonth, "$lte": endMonth }, "enrolled": "00" } } } }) .count(function(e,count){ console.log(count); });

此查询耗时1.6秒。我分别尝试使用以下索引：

1. { 'enrollment.month':1 } 2. { 'enrollment.month':1 }, { 'enrollment.enrolled':1 } -- two seperate indexes 3. { 'enrollment.month':1, 'enrollment.enrolled':1} - just one index on both fields.

获胜查询计划在任何这些情况下都不使用密钥，它总是会执行COLLSCAN。我在这里错过了什么？

聚合非常缓慢

0 个答案: