汇总每个文档中嵌套字段的特定值

时间:2018-11-09 11:03:30

标签: elasticsearch

是否有某种方法可以使嵌套字段上的统计数据夸张,因此我只考虑最大数量的嵌套字段特定值用于统计评估。

映射:

{
    "mappings": {
        "doc": {
            "properties": {
                "student_id": {
                    "type": "long"
                },
                "test_scores": {
                    "type": "nested",
                    "properties": {
                        "test_id": {
                            "type": "long"
                        },
                        "score": {
                            "type": "double"
                        }
                    }
                } 
            }
        }
    }
}

样本数据:

{
  "student_id": 1,
  "test_scores": [
    {
      "test_id": 101,
      "score": 90
    },
    {
      "test_id": 102,
      "score": 70
    },
    {
      "test_id": 103,
      "score": 80
    }
  ]
}

{
  "student_id": 2,
  "test_scores": [
    {
      "test_id": 101,
      "score": 80
    },
    {
      "test_id": 102,
      "score": 90
    },
    {
      "test_id": 103,
      "score": 85
    }
  ]
}

{
  "student_id": 3,
  "test_scores": [
    {
      "test_id": 101,
      "score": 30
    },
    {
      "test_id": 102,
      "score": 40
    },
    {
      "test_id": 103,
      "score": 55
    }
  ]
}

过滤查询:

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "student_id": 1
                }
              },
              {
                "nested": {
                  "path": "test_scores",
                  "query": {
                    "terms": {
                      "test_scores.test_id": [101] 
                    }
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "student_id": 2
                }
              },
              {
                "nested": {
                  "path": "test_scores",
                  "query": {
                    "terms": {
                      "test_scores.test_id": [101, 103] 
                    }
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

要求:

我需要基于aboe过滤查询在test_scores.score上为学生找到min和max(统计汇总),这样我只考虑每个student_id的最大test_scores.score。

示例:

从上述查询中过滤出的文档中,

doc: 
  student_id: 1
  test_scores.test_id: 101
  test_scores.score: 90
  test_scores.score (To be considered for aggregation): 90

doc:
  student_id: 2
  test_scores.test_id: 101, 103
  test_scores.score:    80, 85
  test_scores.score (To be considered for aggregation): 85

Expected overall stats on test_scores.score:
max: 90
min: 85

发现

在网上搜索后,我找到了解决方案:

{
  "aggs": {
    "score_stats": { 
      "stats": {
        "script": "if(doc[\"student_id\"].value == 1){                      
                    return params._source[\"test_scores\"]                  
                        .stream()                                           
                        .filter(nested -> nested.test_id == 101)            
                        .mapToDouble(nested -> nested.score)                
                        .max()                                              
                        .orElse(0)                                          
                  } else if(doc[\"student_id\"].value == 2){                
                    return params._source[\"test_scores\"]                  
                        .stream()                                           
                        .filter(nested ->                                   
                            nested.test_id == 101 || nested.test_id == 103) 
                        .mapToDouble(nested -> nested.score)                
                        .max()                                              
                        .orElse(0)                                          
                  } else {                                                  
                    return 0                                                
                  }"          
      }
    }
  },
  "query": {
        //filtering query copied here
    }
  }
}

回复:

"aggregations" : {
  "score_stats" : {
    "count" : 2,
    "min" : 85.0,
    "max" : 90.0,
    "avg" : 87.5,
    "sum" : 175.0
  }
}

问题:

尽管此解决方案适用于上述简单查询。我真正的查询可能非常复杂。此方法不可扩展,因为脚本长度有上限。

我尝试使用过滤聚合对嵌套聚合进行测试,但是进入嵌套路径后,似乎无法对非嵌套字段执行AND / OR。

是否有更好的方法可以使嵌套字段上的统计数据夸张,因此我只考虑对嵌套字段的特定最大值进行统计评估。

0 个答案:

没有答案