Elasticsearch 按字段分组

时间:2021-02-25 04:41:07

标签: java elasticsearch elasticsearch-aggregation

我想按字段对搜索结果进行分组。 示例:我有 userId 对应多个用户名的数据。 所以在搜索结果中,我想对所有的 userId 及其对应的用户名进行分组。

目前使用聚合,我可以对 userId 进行分组,但无法检索其相应的用户名列表。 我得到如下信息。

"aggregations" : {
"by_user_id" : {
  "after_key" : {
    "group_by_search" : 2335
  },
  "buckets" : [
    {
      "key" : {
        "group_by_search" : 2
      },
      "doc_count" : 2
    },
    {
      "key" : {
        "group_by_search" : 1000
      },
      "doc_count" : 4
    },
    {
      "key" : {
        "group_by_search" : 2335
      },
      "doc_count" : 2
    }
  ]
}

我想要的是:

"aggregations" : {
"by_corp_id" : {
  "after_key" : {
    "group_by_search" : 2335
  },
  "buckets" : [
    {
      "key" : {
        "group_by_search" : 2
        "usernames":[1111,222] ***//this is list of usernames having same userId***
      },
      "doc_count" : 2
    },
    {
      "key" : {
        "group_by_search" : 1000
        "usernames":[11 ,0101,1199,222] ***//this is list of usernames having same userId***
      },
      "doc_count" : 4
    },
    {
      "key" : {
        "group_by_search" : 2335
        "usernames":[1111,222] ***//this is list of usernames having same userId***
      },
      "doc_count" : 2
    }
  ]
}

有没有办法在 Elasticsearch 中使用聚合直接实现这一点?

更新:我正在使用以下聚合

"aggregations": {
    "by_user_id": {
        "composite": {
            "size": 1000,
            "sources": [
                {
                    "group_by_search": {
                        "terms": {
                            "field": "user_id",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                }
            ]
        }
    }
}

谢谢。

2 个答案:

答案 0 :(得分:1)

您需要做的只是在用户名字段上添加一个 terms 子聚合,以便每个存储桶都获得所有唯一用户名的列表:

"aggregations": {
    "by_user_id": {
        "composite": {
            "size": 1000,
            "sources": [
                {
                    "group_by_search": {
                        "terms": {
                            "field": "user_id",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                }
            ]
        },
        "aggs": {
            "username": {
                "terms": {
                    "field": "username",
                    "size": 1000
                }
            }
        }
    }
}

top_hits 也是可能的,但您会得到很多重复项,并且您需要返回大量点击以确保您拥有所有可能的不同用户名。

如果您的用户名字段具有高基数 (>1000),那么最好将用户名上的术语聚合移动到复合源数组中并自己遍历所有存储桶,如下所示:

"aggregations": {
    "by_user_id": {
        "composite": {
            "size": 1000,
            "sources": [
                {
                    "group_by_search": {
                        "terms": {
                            "field": "user_id",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                },
                {
                    "group_by_username": {
                        "terms": {
                            "field": "username",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                }
            ]
        }
    }
}

答案 1 :(得分:0)

您可以使用 top hits aggregation 获取具有相同 ID 的所有用户名的列表。

添加一个工作示例

索引数据:

AWSTemplateFormatVersion: 2010-09-09
Description: A sample template
Resources:
 BFASchemaRegistry: 
  Type: AWS::Glue::Registry
  Properties: 
   Description: AWS Glue Schema Registry for BFA
   Name: BFASchemaRegistry1
   Tags: 
    - Key: band
      Value: bfa

搜索查询:

{
  "usernames": 3,
  "user_id": 2
}
{
  "usernames": 1,
  "user_id": 1
}
{
  "usernames": 2,
  "user_id": 1
}

搜索结果:

{
  "size": 0,
  "aggregations": {
    "by_user_id": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "group_by_search": {
              "terms": {
                "field": "user_id",
                "missing_bucket": false,
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggs": {
        "list_names": {
          "top_hits": {
            "_source": {
              "includes": [
                "usernames"
              ]
            }
          }
        }
      }
    }
  }
}
相关问题