我在ES 1.7.2的安装上做了几个聚合来汇总一些值。
发现在某些随机情况下,每个聚合的doc_count与嵌套级别的doc_count的SUM不匹配的困难方法。
"key": 503,
"doc_count": 383778,
"regionid": {...}
所以doc_count = 383778
如果我对下面列表的regionid的每个元素的doc_count,我有doc_count = 383718
"key": 503,
"doc_count": 383778,
"regionid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 303821,
"ProviderId": {...}
},
{
"key": 27,
"doc_count": 23834,
"ProviderId": {...}
},
{
"key": 25,
"doc_count": 9565,
"ProviderId": {...}
},
{
"key": 36,
"doc_count": 8857,
"ProviderId": {...}
},
{
"key": 14,
"doc_count": 8222,
"ProviderId": {...}
},
{
"key": 68,
"doc_count": 6746,
"ProviderId": {...}
},
{
"key": 19,
"doc_count": 4574,
"ProviderId": {...}
},
{
"key": 28,
"doc_count": 4164,
"ProviderId": {...}
},
{
"key": 10,
"doc_count": 3006,
"ProviderId": {...}
},
{
"key": 31,
"doc_count": 2020,
"ProviderId": {...}
},
{
"key": 21,
"doc_count": 1410,
"ProviderId": {...}
},
{
"key": 32,
"doc_count": 1368,
"ProviderId": {...}
},
{
"key": 22,
"doc_count": 1367,
"ProviderId": {...}
},
{
"key": 8,
"doc_count": 1010,
"ProviderId": {...}
},
{
"key": 16,
"doc_count": 825,
"ProviderId": {...}
},
{
"key": 35,
"doc_count": 559,
"ProviderId": {...}
},
{
"key": 34,
"doc_count": 517,
"ProviderId": {...}
},
{
"key": 26,
"doc_count": 414,
"ProviderId": {...}
},
{
"key": 18,
"doc_count": 371,
"ProviderId": {...}
},
{
"key": 15,
"doc_count": 362,
"ProviderId": {...}
},
{
"key": 33,
"doc_count": 185,
"ProviderId": {...}
},
{
"key": 9,
"doc_count": 143,
"ProviderId": {...}
},
{
"key": 29,
"doc_count": 102,
"ProviderId": {...}
},
{
"key": 17,
"doc_count": 100,
"ProviderId": {...}
},
{
"key": 30,
"doc_count": 96,
"ProviderId": {...}
},
{
"key": 20,
"doc_count": 80,
"ProviderId": {...}
}
]
}
},
你们知道为什么会这样吗?
也许是个错误?
我聚合的一部分:
{
"aggs": {
"Provider": {
"terms": {
"field": "Provider"
},
"aggs": {
"Gateway": {
"terms": {
"field": "Gateway"
},
"aggs": {
"CustomerId": {
"terms": {
"field": "CustomerId"
},
"aggs": {
"regionid": {
"terms": {
"field": "regionid"
感谢任何帮助。 感谢
答案 0 :(得分:1)
ES中的聚合并不精确,它们是基于采样记录数的估算值。如果样本量足够大,那么这个数字可能是准确的,但这会产生重大的性能影响。
您可以阅读有关" Shard Size"的更多信息。在ES documentation on shard_size for terms aggregation
中您的索引更扁平(意味着聚合返回的桶越多),您需要增加碎片大小。我们发现,对于我们系统中的平坦指数,20倍乘数是一个很好的经验法则。因此,如果我返回聚合的前10条记录,我们使用的分片大小为200.