在elasticsearch中搜索带空格的名称(文本)

时间:2013-05-23 08:17:43

标签: search elasticsearch tokenize analyzer

搜索包含空格的名称(文本),给我带来问题, 我的映射类似于

"{"user":{"properties":{"name":{"type":"string"}}}}"

理想情况下它应该返回并按如下方式对结果进行排名

1) Bring on top names that exact match the search term (highest score)
2) Names that starts with the search term (high score)
3) Names that contains the exact search term as substring (medium score)
4) Names that contains any of the search term token  (lowest score)

实施例 对于elasticsearch中的以下名称

Maaz Tariq
Ahmed Maaz Tariq
Maaz Sheeba
Maaz Bin Tariq
Sana Tariq
Maaz Tariq Ahmed

搜索“Maaz Tariq”,结果应按以下顺序

Maaz Tariq (highest score)
Maaz Tariq Ahmed (high score)
Ahmed Maaz Tariq (medium score)
Maaz Bin Tariq  (lowest score)
Maaz Sheeba (lowest score)
Sana Tariq (lowest score)

任何人都可以指出我如何使用和使用哪种分析仪?以及如何对名称的搜索结果进行排名?

3 个答案:

答案 0 :(得分:9)

您可以使用multi field typebool querycustom boost factor query来解决此问题。

映射:

{
    "mappings" : {
        "user" : {        
            "properties" : {
                "name": {
                    "type": "multi_field",
                    "fields": {
                        "name": { "type" : "string", "index": "analyzed" },
                        "exact": { "type" : "string", "index": "not_analyzed" }
                    }
                }
            }
        }
    }
}

查询:

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name": "Maaz Tariq"
                    }
                }
            ],
            "should": [
                {
                    "custom_boost_factor": {
                        "query": {
                            "term": {
                                "name.exact": "Maaz Tariq"
                            }
                        },
                        "boost_factor": 15
                    }
                },
                {
                    "custom_boost_factor": {
                        "query": {
                            "prefix": {
                                "name.exact": "Maaz Tariq"
                            }
                        },
                        "boost_factor": 10
                    }
                },
                {
                    "custom_boost_factor": {
                        "query": {
                            "match_phrase": {
                                "name": {
                                    "query": "Maaz Tariq",
                                    "slop": 0
                                }
                            }
                        },
                        "boost_factor": 5
                    }
                }
            ]
        }
    }
}

修改

正如javanna所指出,不需要custom_boost_factor

不使用custom_boost_factor进行查询:

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name": "Maaz Tariq"
                    }
                }
            ],
            "should": [
                {
                    "term": {
                        "name.exact": {
                            "value": "Maaz Tariq",
                            "boost": 15
                        }
                    }
                },
                {
                    "prefix": {
                        "name.exact": {
                            "value": "Maaz Tariq",
                            "boost": 10
                        }
                    }
                },
                {
                    "match_phrase": {
                        "name": {
                            "query": "Maaz Tariq",
                            "slop": 0,
                            "boost": 5
                        }
                    }
                }
            ]
        }
    }
}

答案 1 :(得分:0)

对于Java Api,当使用空格查询确切的字符串时使用;

CLIENT.prepareSearch(index)
    .setQuery(QueryBuilders.queryStringQuery(wordString)
    .field(fieldName));

在很多其他查询中,你得不到任何结果

答案 2 :(得分:0)

来自Elasticsearch 1.0:

"title": {
    "type": "multi_field",
    "fields": {
        "title": { "type": "string" },
        "raw":   { "type": "string", "index": "not_analyzed" }
    }
}

成为:

"title": {
    "type": "string",
    "fields": {
        "raw":   { "type": "string", "index": "not_analyzed" }
    }
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html