弹性搜索中的多个类似查询

时间:2015-12-30 10:58:47

标签: elasticsearch elasticsearch-plugin

我的弹性搜索文档中有一个字段path,其中包含这样的条目

/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_011007/stderr
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_008874/stderr

#*Note -- I want to select all the documents having below line in the **path** field
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_009257/stderr

我希望在给定某些事情的情况下对此path字段进行类似的查询(基本上所有3个都是AND条件): -

  1. 我已提供申请号1451299305289_0120
  2. 我还提供了一个任务编号009257
  3. 路径字段还应包含stderr
  4. 鉴于上述标准,应选择具有路径字段作为第3行的文档

    这是我到目前为止所尝试的

    http://localhost:9200/logstash-*/_search?q=application_1451299305289_0120 AND path:stderr&size=50
    

    此查询符合第3条标准,部分符合第1条标准,即如果我搜索1451299305289_0120而非application_1451299305289_0120,则结果为0。 (我真正需要的就是搜索1451299305289_0120

    当我尝试这个时

    http://10.30.145.160:9200/logstash-*/_search?q=path:*_1451299305289_0120*008779 AND path:stderr&size=50
    

    我得到了结果,但在开始时使用*是一项代价高昂的操作。是他们有效实现这一目标的另一种方式(例如使用nGram并使用fuzzy-search的{​​{1}}

1 个答案:

答案 0 :(得分:1)

这可以通过使用Pattern Replace Char Filter来实现。您只需使用regex提取重要信息。这是我的设置

POST log_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "app_analyzer": {
          "char_filter": [
            "app_extractor"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "path_analyzer": {
          "char_filter": [
            "path_extractor"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "task_analyzer": {
          "char_filter": [
            "task_extractor"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "app_extractor": {
          "type": "pattern_replace",
          "pattern": ".*application_(.*)/container.*",
          "replacement": "$1"
        },
        "path_extractor": {
          "type": "pattern_replace",
          "pattern": ".*/(.*)",
          "replacement": "$1"
        },
        "task_extractor": {
          "type": "pattern_replace",
          "pattern": ".*container.{27}(.*)/.*",
          "replacement": "$1"
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "keyword",
          "fields": {
            "application_number": {
              "type": "string",
              "analyzer": "app_analyzer"
            },
            "path": {
              "type": "string",
              "analyzer": "path_analyzer"
            },
            "task": {
              "type": "string",
              "analyzer": "task_analyzer"
            }
          }
        }
      }
    }
  }
}

我正在使用正则表达式提取application numbertask numberpath。如果您有其他日志模式,您可能希望优化task regex,然后我们可以使用Filters进行搜索。使用过滤器的一大优势是它们缓存并使后续通话更快。

我像这样索引样本日志

PUT log_index/your_type/1
{
  "name" : "/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_009257/stderr"
}

此查询将为您提供所需的结果

GET log_index/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "name.application_number": "1451299305289_0120"
              }
            },
            {
              "term": {
                "name.task": "009257"
              }
            },
            {
              "term": {
                "name.path": "stderr"
              }
            }
          ]
        }
      }
    }
  }
}

旁注filtered query已在ES 2.x中弃用,只需直接使用过滤器。另外path hierarchy可能对其他一些用途有用

希望这会有所帮助:)