使用弹性搜索地理功能查找按时间排序的最常见位置

时间:2016-05-23 01:11:47

标签: elasticsearch geolocation

我目前有一个ES查询,它使用geohash_griddate_histogram为我提供了“地理信息桶”列表:

  "aggregations": {
"zoomedInView": {
  "filter": {
    "geo_bounding_box": {
      "location": {
        "top_left": "-37, 140",
        "bottom_right": "-38, 146"
      }
    }
  },
  "aggregations": {
    "zoom1": {
      "geohash_grid": {
        "field": "location",
        "precision": 6
      },
      "aggs": {
        "ts": {
          "date_histogram": {
            "min_doc_count" : 1,
            "field": "dateTime",
            "interval": "1m",
            "format": "DDD HH:mm"
          }
         },
         "map_zoom": { 
            "geo_bounds": {
                "field": "location"
            }
        }
      }
    }
  }
}

这给我的结果看起来像:

{
              "key": "r1r0fu",
              "map_zoom": {
                 "bounds": {
                    "top_left": {
                       "lat": -38.81073913909495,
                       "lon": 124.96536672115326
                    },
                    "bottom_right": {
                       "lat": -38.81329075805843,
                       "lon": 124.96823584660888
                    }
                 }
              },
              "ts": {
                 "buckets": [
                    {
                       "key_as_string": "136 20:15",
                       "key": 1463354100000,
                    },                       
                    {
                       "key_as_string": "137 04:30",
                       "key": 1463365800000,
                       "doc_count": 1
                    },
....

{
              "key": "r1r0gx",
              "map_zoom": {
                 "bounds": {
                    "top_left": {
                       "lat": -38.798130828887224,
                       "lon": 124.99871227890253
                    },
                    "bottom_right": {
                       "lat": -38.79820383526385,
                       "lon": 124.99872468411922
                    }
                 }
              },
              "ts": {
                 "buckets": [
                    {
                       "key_as_string": "136 23:21",
                       "key": 1463354460000,
                    },
                    {
                       "key_as_string": "137 02:30",
                       "key": 1463365800000,
                    },
                    {
                       "key_as_string": "137 03:31",
                       "key": 1463369460000,
                    }
                 ]
              }
           },

在上面的例子中,结果按地理桶r1r0fur1r0gx排序,并在桶内订购事件的有序时间(格式为每年HHH:mm)他们的数量。

我真正喜欢的是:

1)按时间排序的结果,可能意味着相同的桶会出现多次。

2)只有每个桶中显示的最短和最长时间(如果可能)

所以上面的结果理想情况如下:

                {
              "key": "r1r0fu",
              "map_zoom": {
                 "bounds": {
                    "top_left": {
                       "lat": -38.81073913909495,
                       "lon": 124.96536672115326
                    },
                    "bottom_right": {
                       "lat": -38.81329075805843,
                       "lon": 124.96823584660888
                    }
                 }
              },
              "ts": {
                 "buckets": [
                    {
                       "key_as_string": "136 20:15",
                       "key": 1463354100000,
                    },
                ]
              }
            },
            {
              "key": "r1r0gx",
              "map_zoom": {
                 "bounds": {
                    "top_left": {
                       "lat": -38.798130828887224,
                       "lon": 124.99871227890253
                    },
                    "bottom_right": {
                       "lat": -38.79820383526385,
                       "lon": 124.99872468411922
                    }
                 }
              },
              "ts": {
                 "buckets": [
                    {
                       "key_as_string": "136 23:21",
                       "key": 1463354460000,
                    },                
                    {
                       "key_as_string": "137 03:31",
                       "key": 1463369460000,
                    },  
                }
            },
            {
              "key": "r1r0fu",
              "map_zoom": {
                 "bounds": {
                    "top_left": {
                       "lat": -38.81073913909495,
                       "lon": 124.96536672115326
                    },
                    "bottom_right": {
                       "lat": -38.81329075805843,
                       "lon": 124.96823584660888
                    }
                 }
              },
              "ts": {
                 "buckets": [
                    {
                       "key_as_string": "137 04:30",
                       "key": 1463365800000,
                    }
                ]
              }
            },
            ...

结果按时间排序,因此在这种情况下,存储桶r1r0fu会出现两次。事件"key_as_string": "137 02:30",已被隐藏,因为它不是最短或最长日期。

这有可能吗?

非常感谢!

1 个答案:

答案 0 :(得分:1)

如果您希望按时间排序结果,可能最好将date_histogram聚合与geohash_grid聚合交换,如下所示:

{
  "aggregations": {
    "zoomedInView": {
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": "-37, 140",
            "bottom_right": "-38, 146"
          }
        }
      },
      "aggregations": {
        "ts": {
          "date_histogram": {
            "min_doc_count": 1,
            "field": "dateTime",
            "interval": "1m",
            "format": "DDD HH:mm"
          },
          "aggs": {
            "zoom1": {
              "geohash_grid": {
                "field": "location",
                "precision": 6
              }
            },
            "map_zoom": {
              "geo_bounds": {
                "field": "location"
              }
            }
          }
        }
      }
    }
  }
}

那将解决问题1)。但是,由于现在每个主存储桶都是时间存储桶,因此您将无法再拥有最小和最大时间。试一试,看看它是否适合您的需求。