获取elasticsearch同义词有效吗?

时间:2015-01-07 04:16:19

标签: elasticsearch

我正在尝试对弹性搜索同义词进行简单测试而没有成功,这就是我到目前为止

POST /mysearch
{
    "settings" : {
        "number_of_shards" :   5,
        "number_of_replicas" : 0,
        "analysis": {
            "filter" : {
                "my_ascii_folding" : {
                    "type" : "asciifolding",
                    "preserve_original" : true
                },
                "my_stopwords": {
                    "type":       "stop",
                    "stopwords": [ ]
                },
                "mysynonym" : {
                    "type" : "synonym",
                    "synonyms" : [
                        "foo => bar"
                    ]
                }
            },
            "char_filter": {
                "my_htmlstrip": {
                    "type": "html_strip"
                }
            }, 
            "analyzer": {
                "index_text_analyzer":{
                    "type": "custom",
                    "tokenizer":    "standard",
                    "filter":       [ "lowercase", "my_stopwords", "my_ascii_folding" ]
                },
                "index_html_analyzer":{
                    "type": "custom",
                    "tokenizer":    "standard",
                    "char_filter": "my_htmlstrip",
                    "filter":       [ "lowercase", "my_stopwords", "my_ascii_folding" ]
                },
                "search_text_analyzer":{
                    "type": "custom",
                    "tokenizer":    "standard",
                    "filter":       [ "mysynonym", "lowercase", "my_stopwords" ]
                }
            }
        }
    },
    "mappings" : {
        "news" : {
            "_source" : { "enabled" : true },
            "_all" : {"enabled" : false},
            "properties" : {
                "name" : { "type" : "string", "index" : "analyzed", "store": "yes" , "analyzer": "index_text_analyzer" , "search_analyzer": "search_text_analyzer" }
            }
        }
    }
}

添加一些文档

POST /mysearch/news
{
    "name":"foo kar"
}
POST /mysearch/news
{
    "name":"bar kar"
}

进行搜索

POST /mysearch/_search?q=name:foo
{

}

给我的结果与foo匹配,而不是bar,为什么?

1 个答案:

答案 0 :(得分:3)

我认为你做错了,原因如下:

  1. 为什么使用foo => bar?这意味着您使用foo 替换 bar,而如果它们是同义词,则应将它们都编入索引。所以,我会改用foo,bar
  2. 为什么在索引时,您使用的是不同的搜索器而不是搜索时间?在索引时,您需要使用其同义词对文本建立索引。
  3. 让我举个例子:假设你索引foo kar。由于barfoo的同义词,您也希望将其同义词编入索引,以便索引包含foobar,{{1 }}。这样,如果您搜索karfoo该文档将在索引中找到,即使原始文本不包含bar

    话虽如此,我建议如下:

    bar

    或者,如果您不想索引同义词,只需将原始文本编入索引,然后仅在搜索时搜索同义词,请执行以下更改:

    • POST /mysearch { "settings": { "number_of_shards": 5, "number_of_replicas": 0, "analysis": { "filter": { "my_ascii_folding": { "type": "asciifolding", "preserve_original": true }, "my_stopwords": { "type": "stop", "stopwords": [] }, "mysynonym": { "type": "synonym", "synonyms": [ "foo,bar" ] } }, "char_filter": { "my_htmlstrip": { "type": "html_strip" } }, "analyzer": { "index_text_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ] }, "index_html_analyzer": { "type": "custom", "tokenizer": "standard", "char_filter": "my_htmlstrip", "filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ] }, "search_text_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "mysynonym", "lowercase", "my_stopwords" ] } } } }, "mappings": { "news": { "_source": { "enabled": true }, "_all": { "enabled": false }, "properties": { "name": { "type": "string", "index": "analyzed", "store": "yes", "analyzer": "search_text_analyzer" } } } } } 因为,如上所述,您将"synonyms": ["foo,bar"]替换为foo,否则
    • 明确指定两个分析器:
    bar

    上述两项更改将导致您的文本按原样编制索引(没有同义词),但在搜索时,当您要搜索"index_analyzer": "index_text_analyzer", "search_analyzer": "search_text_analyzer" 时,Elasticsearch将搜索其同义词: foofoo

相关问题