Question

出于优化目的，我正在尝试减少总场数。然而，在我要做之前，我想知道我实际拥有多少个字段。 _stats端点似乎没有任何信息，我无法弄清楚迁移工具如何进行字段计数计算。

是否有某种方法（使用端点或其他方法）来获取指定索引的总字段数？

Answer 1

为了进一步构建其他答案所提供的内容，您可以获取映射，然后只计算关键字type在输出中出现的次数，这将给出每个字段需要的字段数类型：

curl -s -XGET localhost:9200/index/_mapping?pretty | grep type | wc -l

Answer 2

Val的第一个答案也为我解决了这个问题。但是我只想列出一些可能导致误导数字的极端情况。

文档中包含带有“类型”字样的字段。

例如

 "content_type" : {
   "type" : "text",
     "fields" : {
       "keyword" : {
          "type" : "keyword",
       }
     }
   },

这将匹配grep type三次，而它应该只进行两次，即不匹配“ content_type”。此方案很容易解决。

代替

curl -s -XGET localhost:9200/index/_mapping?pretty | grep type

使用

curl -s -XGET localhost:9200/index/_mapping?pretty | grep '"type"'

获得“类型”的精确匹配

该文档的字段名称为“类型”

例如

"type" : {
  "type" : "text",
   "fields" : {
     "keyword" : {
       "type" : "keyword"
     }
   }
},

在这种情况下，匹配也是三次，而不是两次。但是使用

curl -s -XGET localhost:9200/index/_mapping?pretty | grep '"type"'

不会削减它。我们将不得不跳过带有“ type”关键字作为子字符串以及完全匹配项的字段。在这种情况下，我们可以添加一个额外的过滤器，如下所示：

curl -s -XGET localhost:9200/index/_mapping?pretty |\
grep '"type"' | grep -v "{"

除了上述两种情况外，如果您以编程方式使用api将数字进行跟踪，即将其推入AWS Cloudwatch或Graphite之类的内容中，则可以使用以下代码来调用API-获取数据并递归搜索关键字“类型”-跳过任何模糊匹配，并更深入地解析具有确切名称“类型”的字段。

import sys
import json
import requests

# The following find function is a minor edit of the function posted here
# https://stackoverflow.com/questions/9807634/find-all-occurrences-of-a-key-in-nested-python-dictionaries-and-lists

def find(key, value):
  for k, v in value.iteritems():
    if k == key and not isinstance(v, dict) and not isinstance(v, list):
      yield v
    elif isinstance(v, dict):
      for result in find(key, v):
        yield result
    elif isinstance(v, list):
      for d in v:
        for result in find(key, d):
          yield result

def get_index_type_count(es_host):
  try:
    response = requests.get('https://%s/_mapping/' % es_host)
  except Exception as ex:
    print('Failed to get response - %s' % ex)
    sys.exit(1)

  indices_mapping_data = response.json()
  output = {}

  for index, mapping_data in indices_mapping_data.iteritems():
    output[index] = len(list(find('type', mapping_data)))

  return output

if __name__ == '__main__':
  print json.dumps(get_index_type_count(sys.argv[1]), indent=2)

上面的代码也按要点发布在这里-https://gist.github.com/saurabh-hirani/e8cbc96844307a41ff4bc8aa8ebd7459

Answer 3

您可以使用索引API的_mapping端点获取该信息，请参阅documentation

get mapping API允许检索索引或索引/类型的映射定义。

GET / twitter / _mapping / tweet

使用curl：curl [elasticsearch adress]/[index]/_mapping?pretty

Answer 4

这是一种无需编写脚本即可在 Kibana 中获得相对估算值的快速方法（我不认为这是100％精确的方法，但这是一种简单的方法来判断动态字段是否在起作用出于某种原因而达到大量）

在Kibana开发工具中运行此查询

GET /index_name/_mapping

在Kibana输出中，对所有"type"实例（包括引号）执行搜索。这将计算实例并为您提供答案。 （在此示例中为804）

如果您对为什么出现[remote_transport_exception]错误的原因ing之以鼻，这可能会有所帮助

Limit of total fields [1000] in index [index_name] has been exceeded

Answer 5

您可以尝试以下方法：

curl -s -XGET "http://localhost:9200/index/_field_caps?fields=*" | jq '.fields|length'

Answer 6

一个字段可以有多个“类型”：例如

"datapath-id": {
    "fields": {
        "keyword": {
            "ignore_above": 256, 
            "type": "keyword"
        }
    }, 
    "type": "text"
}

我们可以忽略“字段”中的“类型”以获得确切的字段计数。一个例子是：

import json


def myprint(d, field_count):
    for k, v in d.iteritems():
        if isinstance(v, dict):
            if k != "fields":
                field_count = myprint(v, field_count)
        else:
            print "{0} : {1}".format(k, v)
            field_count += 1
    return field_count

with open("output/mappings.json") as f:
    d = json.load(f)
    final_field_count = myprint(d, field_count=0)
    print "field count", final_field_count

获取索引上的字段数

6 个答案: