Elasticsearch不生成数字标记

时间:2017-02-24 23:03:14

标签: elasticsearch

我无法让Elasticsearch在诸如15 pound chocolate cake之类的短语上生成正确的令牌。在对该字段执行和fielddata_field查询时,它会产生以下结果:

pou poun pound cho choc choco chocol chocola chocolat chocolate cak cake

我根本看不到那里的数字。我尝试了几种不同的分析仪选项组合无济于事。以下是我的映射:

{ "settings" : { "index" : { "analysis": { "filter": { "nGram_filter": { "type": "edge_ngram", "min_gram": 3, "max_gram": 20 }, "my_word": { "type":"word_delimiter", "preserve_original": "true" } }, "analyzer": { "nGram_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "standard", "lowercase", "asciifolding", "my_word", "nGram_filter" ] }, "whitespace_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "asciifolding" ] } } }} }, "mappings": { "categories": { "properties": { "id": {"type": "text"}, "sort": {"type": "long"}, "search_term":{"type":"text","analyzer": "nGram_analyzer","search_analyzer": "whitespace_analyzer", "fielddata":true} } } } }

我尝试了nGram过滤器,如:

"nGram_filter": { "type": "edge_ngram", "min_gram": 3, "max_gram": 20, "token_chars": [ "letter", "digit", "punctuation", "symbol" ] }

"generate_number_parts": "true"上设置"generate_word_parts": true word_delimiter也无济于事。

修改 我通过将min_gram大小更改为2来实现它,但我希望将其保持为3.我想知道是否有一种方法可以保持克大小为3而且还保持数字不变?

1 个答案:

答案 0 :(得分:0)

行为符合预期。这不是数字标记的问题,而是术语长度。即使你有一个包含1或2个字符的字符串,它也会被过滤掉。

  

min_gram:克中字符的最小长度。默认为1

任何字符数少于min的字符都将被过滤掉

因此,在这种情况下,15会被过滤掉。