弹性搜索中日期直方图和日期范围之间的结果不同

时间:2015-09-07 14:19:16

标签: kibana date-range elasticsearch date-histogram

我想用Elastic Search / Kibana分析我的日志数据,并按月计算唯一客户。 当我使用日期直方图聚合和日期范围聚合时,结果会有所不同。

这是日期直方图查询:

"query": {
    "query_string": {
      "query": "_type:logs AND created_at:[2015-04-01 TO now]",
      "analyze_wildcard": true
    }
  },
  "size": 0,
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "created_at",
        "interval": "1M",
        "min_doc_count": 1
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "customer.id"
          }
        }
      }
    }
  }

结果:

"aggregations": {
    "2": {
      "buckets": [
        {
          "1": {
            "value": 595805
          },
          "key_as_string": "2015-04-01T00:00:00.000Z",
          "key": 1427839200000,
          "doc_count": 6410438
        },
        {
          "1": {
            "value": 647788
          },
          "key_as_string": "2015-05-01T00:00:00.000Z",
          "key": 1430431200000,
          "doc_count": 6669555
        },...

以下是日期范围查询:

"query": {
    "query_string": {
      "query": "_type:logs AND created_at:[2015-04-01 TO now]",
      "analyze_wildcard": true
    }
  },
  "size": 0,
  "aggs": {
    "2": {
      "date_range": {
        "field": "created_at",
        "ranges": [
          {
            "from": "2015-04-01",
            "to": "2015-05-01"
          },
          {
            "from": "2015-05-01",
            "to": "2015-06-01"
          }
        ]
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "customer.id"
          }
        }
      }
    }
  }

回复:

"aggregations": {
    "2": {
      "buckets": [
        {
          "1": {
            "value": 592179
          },
          "key": "2015-04-01T00:00:00.000Z-2015-05-01T00:00:00.000Z",
          "from": 1427846400000,
          "from_as_string": "2015-04-01T00:00:00.000Z",
          "to": 1430438400000,
          "to_as_string": "2015-05-01T00:00:00.000Z",
          "doc_count": 6411884
        },
        {
          "1": {
            "value": 616995
          },
          "key": "2015-05-01T00:00:00.000Z-2015-06-01T00:00:00.000Z",
          "from": 1430438400000,
          "from_as_string": "2015-05-01T00:00:00.000Z",
          "to": 1433116800000,
          "to_as_string": "2015-06-01T00:00:00.000Z",
          "doc_count": 6668060
        }
      ]
    }
  }

在第一种情况下,我4月份为595,805,5月份为647,788 在第二种情况下,我有4月592,179和5月616,995

有人可以解释为什么我在这些用例之间存在这些差异?

谢谢

我更新了我的第一篇帖子以添加其他示例

我添加了另一个示例,数据更少(1天),但问题相同。这是第一个带日期直方图的请求:

{
  "size": 0,
  "query": {
    "query_string": {
      "query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
      "analyze_wildcard": true
    }
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "created_at",
        "interval": "1h",
        "pre_zone": "00:00",
        "pre_zone_adjust_large_interval": true,
        "min_doc_count": 1
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "customer.id"
          }
        }
      }
    }
  }
}

我们可以看到660个唯一计数,第一个小时有1717个doc计数:

{  
   "hits":{  
      "total":203961,
      "max_score":0,
      "hits":[  

      ]
   },
   "aggregations":{  
      "2":{  
         "buckets":[  
            {  
               "1":{  
                  "value":660
               },
               "key_as_string":"2015-04-01T00:00:00.000Z",
               "key":1427846400000,
               "doc_count":1717
            },
            {  
               "1":{  
                  "value":324
               },
               "key_as_string":"2015-04-01T01:00:00.000Z",
               "key":1427850000000,
               "doc_count":776
            },
            {  
               "1":{  
                  "value":190
               },
               "key_as_string":"2015-04-01T02:00:00.000Z",
               "key":1427853600000,
               "doc_count":481
            }
         ]
      }
   }
}

但是对于日期范围的第二个请求:

{
  "size": 0,
  "query": {
    "query_string": {
      "query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
      "analyze_wildcard": true
    }
  },
  "aggs": {
    "2": {
      "date_range": {
        "field": "created_at",
        "ranges": [
          {
            "from": "2015-04-01T00:00:00",
            "to": "2015-04-01T01:00:00"
          },
          {
            "from": "2015-04-01T01:00:00",
            "to": "2015-04-01T02:00:00"
          }
        ]
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "customer.id"
          }
        }
      }
    }
  }
}

我们只能看到633个唯一计数和1717个doc计数:

{  
   "hits":{  
      "total":203961,
      "max_score":0,
      "hits":[  

      ]
   },
   "aggregations":{  
      "2":{  
         "buckets":[  
            {  
               "1":{  
                  "value":633
               },
               "key":"2015-04-01T00:00:00.000Z-2015-04-01T01:00:00.000Z",
               "from":1427846400000,
               "from_as_string":"2015-04-01T00:00:00.000Z",
               "to":1427850000000,
               "to_as_string":"2015-04-01T01:00:00.000Z",
               "doc_count":1717
            },
            {  
               "1":{  
                  "value":328
               },
               "key":"2015-04-01T01:00:00.000Z-2015-04-01T02:00:00.000Z",
               "from":1427850000000,
               "from_as_string":"2015-04-01T01:00:00.000Z",
               "to":1427853600000,
               "to_as_string":"2015-04-01T02:00:00.000Z",
               "doc_count":776
            }
         ]
      }
   }
}

请有人告诉我为什么?谢谢

1 个答案:

答案 0 :(得分:1)

使用date_histogram汇总时,您需要考虑timezone date_rangefrom: 1427839200000并不总是使用GMT时区。

如果查看结果中的长毫秒值,您将看到以下内容:

对于您的日期直方图,2015-03-31T22:00:00.000Z实际上等于key_as_string,它与根据GMT时区格式化的2015-04-01T00:00:00.000Z值(即time_zone)不同。< / p>

在您的第一个聚合中,尝试明确指定 "date_histogram": { "field": "created_at", "interval": "1M", "min_doc_count": 1, "time_zone": -2 }, 参数作为您当前的时区(显然是GMT + 2),您应该得到相同的结果:

{{1}}