当您使用弹性搜索在500万条记录集上键入全文搜索时,如何进行增量/搜索

时间:2018-02-15 14:10:59

标签: elasticsearch search full-text-search n-gram incremental-search

我在所有维基百科文章名称的大数据集上使用弹性搜索,他们的数字大约是500万数据库字段名称是文章名称

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "filter":{
            "nGram_filter":{
               "type":"edgeNGram",
               "min_gram":1,    
               "max_gram":20,
               "token_chars":[
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         },
         "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer"
            }
         }
      }
   }
}'

引用这些链接来解决我的问题,但徒劳无功

Edge NGram with phrase matching

https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf

我的目标是获得“sachin t”输入查询的结果如下所示

sachin tendulkar
sachin tendulkar centuries
sachin tejas 
sachin top 60 quotes
sachin talwalkar
sachin tawade
sachin taps

并查询“sachin te”

sachin tendulkar
sachin tendulkar centuries
sachin tejas 

并查询“sachin ta”

sachin talwalkar
sachin tawade
sachin taps

并查询“sachin ten”

sachin tendulkar
sachin tendulkar centuries

请记住,数据集很大,一些文章名称和单词可以有特殊字符和单词,如“Bronisław-Komorowski”

我能够获得小数据集的输出,最多可达10万条记录,但只要我的数据集更改为0.5到500万条记录 我无法获得输出

我的查询是

http://127.0.0.1:9200/index_wiki_articlenames/_search?&q=articlenames:sachin-t+articlenames:sachin-t.*&filter_path=hits.hits._source.articlenames&size=50

2 个答案:

答案 0 :(得分:0)

您应该尝试以下设置:

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}'

同样在查询时尝试此查询:

GET my_index/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "Sachin T", 
        "operator": "and"
      }
    }
  }
}

答案 1 :(得分:0)

我知道为时已晚,但是任何正在寻找解决方案的人都可以尝试此查询。映射和索引正确。似乎缺少查询部分中的运算符。

GET index_wiki_articlenames/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "sachin ten", 
        "operator": "and"
      }
    }
  }
}

这导致

sachin tendulkar
sachin tendulkar centuries