使用spark检索聚合/桶

时间:2017-10-10 08:52:47

标签: scala apache-spark elasticsearch

GET test_data/_search
{
  "query": {"bool": {"must": [
    {"match": {"company":"foo"}}
  ] 
  }},

  "size": 0, 
    "aggs" : {
       "filenames": {
         "terms":{
           "field": "filename.keyword"
         },
        "aggs": {
         "maxDate": {"max": {"field":"timestamp"}},
         "minDate": {"min": {"field":"timestamp"}}
       }
      }     
    }
} 

示例输出:

{
  "took": 1052,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 52120825,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "filenames": {
      "doc_count_error_upper_bound": 97326,
      "sum_other_doc_count": 51389890,
      "buckets": [
        {
          "key": "Messages_20170711_080003.mes",
          "doc_count": 187131,
          "minDate": {
            "value": 1499724098000,
            "value_as_string": "2017-07-10T22:01:38.000Z"
          },
          "maxDate": {
            "value": 1499760002000,
            "value_as_string": "2017-07-11T08:00:02.000Z"
          }
        },
        {
          "key": "Messages_20170213_043108.mes",
          "doc_count": 115243,
          "minDate": {
            "value": 1486735453000,
            "value_as_string": "2017-02-10T14:04:13.000Z"
          },
          "maxDate": {
            "value": 1486960265000,
            "value_as_string": "2017-02-13T04:31:05.000Z"
          }
        },

此查询在kibana开发工具中输入时返回想要的结果。

当我尝试使用spark elasticsearch

返回结果桶时
val df = spark.sqlContext.esDF(esInputIndexName, query = queryString)
df.show(10, false)

数据框显示所有命中,而不是内部聚合的桶。 如何将聚合/桶提供的结果存储在数据框中?

1 个答案:

答案 0 :(得分:0)

一种解决方法是通过spark执行聚合。

// service.js

'use strict';

const axios = require('axios');

const getOwner = (url) => axios.get(url)
.then(response => response.data['name'])
.catch((error) => {
   if (error.response && error.response.status === 404) {
            return `\u2014`;
   };
});

module.exports = {
 getOwner
}
相关问题