后过滤日期直方图聚合存储桶结果不起作用

时间:2018-12-07 03:13:27

标签: elasticsearch logstash

我有一个聚合查询,在该查询中我试图计算在特定时间范围内每个IP地址的目标ips数的最大标准偏差。众所周知,移动函数std_dev聚合函数存在常见问题,由于之前未考虑任何数据,因此前2天的std dev值将始终分别为null和0。

这是我的汇总查询:

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "aggregations.range.buckets.by ip.buckets.by date.buckets.max_dest_ips.value"
          }
        }
      ]
    }
  },
  "aggs": {
    "range": {
      "date_range": {
        "field": "Source Time",
        "ranges": [
          {
            "from": "2018-04-25",
            "to": "2018-05-02"
          }
        ]
      },
      "aggs": {
        "by ip": {
          "terms": {
            "field": "IP Address.keyword",
            "size": 500
          },
          "aggs": {
            "datehisto": {
              "date_histogram": {
                "field": "Source Time",
                "interval": "day"
              },
              "aggs": {
                "max_dest_ips": {
                  "sum": {
                    "field": "aggregations.range.buckets.by ip.buckets.by date.buckets.max_dest_ips.value"
                  }
                },
                "max_dest_ips_std_dev": {
                  "moving_fn": {
                    "buckets_path": "max_dest_ips",
                    "window": 3,
                    "script": "MovingFunctions.stdDev(values, MovingFunctions.unweightedAvg(values))"
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "post_filter": {
    "range": {
      "Source Time": {
        "gte": "2018-05-01"
      }
    }
  }
}

以下是响应的摘要:

    {
    "key": "192.168.0.1",
    "doc_count": 6,
    "datehisto": {
      "buckets": [
        {
          "key_as_string": "2018-04-25T00:00:00.000Z",
          "key": 1524614400000,
          "doc_count": 1,
          "max_dest_ips": {
            "value": 309
          },
          "max_dest_ips_std_dev": {
            "value": null
          }
        },
        {
          "key_as_string": "2018-04-26T00:00:00.000Z",
          "key": 1524700800000,
          "doc_count": 1,
          "max_dest_ips": {
            "value": 529
          },
          "max_dest_ips_std_dev": {
            "value": 0
          }
        },
        {
          "key_as_string": "2018-04-27T00:00:00.000Z",
          "key": 1524787200000,
          "doc_count": 1,
          "max_dest_ips": {
            "value": 408
          },
          "max_dest_ips_std_dev": {
            "value": 110
          }
        },
        {
          "key_as_string": "2018-04-28T00:00:00.000Z",
          "key": 1524873600000,
          "doc_count": 1,
          "max_dest_ips": {
            "value": 187
          },
          "max_dest_ips_std_dev": {
            "value": 89.96419040682551
          }
        }
    ]
    }
}

我想要的是将前2天的存储区数据(第25和26日)过滤并从上述存储区结果中删除。我已经尝试了上面的后置过滤器和常规查询过滤器下方:

  "filter": {
    "range": {
      "Source Time": {
        "gte": "2018-04-27"
      }
    }
  }

后置过滤器什么也不做,也不起作用。上面的过滤范围查询使存储桶从27号开始,但是当我希望它从25号开始时,标准偏差计算也从27号开始(导致27号为空,而28号为0)。

还有其他替代解决方案吗?非常感谢您的帮助!

0 个答案:

没有答案