我有一个数据集,我想按几个字段进行汇总,但是很难用一个管道来完成。数据集如下所示:
[
{
"filename": "file1.js",
"editor": "vscode",
"lines": 45,
// Just for illustration.
// It's a real datetime in Mongo.
"date": "2019-02-21"
},
{
"filename": "file1.js",
"editor": "vscode",
"lines": 32,
"date": "2019-02-21"
},
{
"filename": "file2.js",
"editor": "vim",
"lines": 57,
"date": "2019-02-22"
},
{
"filename": "file2.js",
"editor": "vim",
"lines": 18,
"date": "2019-02-22"
}
]
基于日期,文件名和编辑器,我想提供每个字段的统计信息。我想首先按日期分组,而不是总结每个文件和编辑器的行数。现在,我有这个分组汇总,按日期和文件名分组:
{
_id: {
date: {
day: { $dayOfMonth: "$date" },
month: { $month: "$date" },
year: { $year: "$date"}
},
filename: '$filename'
},
lines: { $sum: '$lines' }
}
我还必须为编辑器(以及示例中省略的其他字段)运行相同的精确聚合,并且考虑到有很多数据(成千上万),不确定是否是最大的数据性能选项。我想要实现的是一个单一的聚合,其产生的结果类似于以下内容。数字是完全由数字组成的,只是为了说明问题。
{
dates: {
"2019-02-22": {
// total lines for each file
files: [
{ name: 'file1.js', lines: 100 }
],
// total lines for each editor
editors: [
{ name: 'vscode', lines: 87 }
],
totals: { lines: 170 } // total lines for the day
},
"2019-02-21": {
files: [
{ name: 'file2.js', lines: 100 }
],
editors: [
{ name: 'vim', lines: 140 }
],
totals: { lines: 220 }
}
},
totals: { lines: 390 } // total lines for the days combines
}