精确匹配和模糊性...什么是好方法?

时间:2018-11-29 03:23:47

标签: elasticsearch autocomplete fuzzy-search exact-match

我花了很多时间试图找到支持多国语言城市的最佳方式来创建和自动完成。 (ES / EN),模糊性和完全匹配的优先级(在结果顶部显示),但我找不到完成此任务的好方法。

我当前的解决方案在很多情况下都可以很好地工作,但是当我找到Roma时,第一个选择是“ Iasi-East Romania,romania”,Roma Italy是30个函数(完全匹配)

结果杰森:

<div class="wrapper">
  <input type="range" min="1" data-whatever="size" max="800" value="50" id="sliderSize">
  <em>50</em>
  <span>Size</span>
  <br>
  <input type="range" min="1" data-whatever="OriginY" max="800" value="50" id="sliderOriginY">
  <em>50</em>
  <span>OriginY</span>
  <br>
  <input type="range" min="1" data-whatever="OriginX" max="800" value="50" id="sliderOriginX">
  <em>50</em>
  <span>OriginX</span>
</div>

现在这是我最好的解决方法。

映射:

[{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F","_score":75.50012,"_source":{"destination_name_en":"Iasi-East Romania","destination_name_es":"Iasi-East Romania","destination_name_pt":"Iasi-East Romania","country_code":"RO","country_name":"ROMANIA","destination_id":7953,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"7380XWcBn2nzTu98OMZl","_score":73.116455,"_source":{"destination_name_en":"La Romana","destination_name_es":"La Romana","destination_name_pt":"La Romana","country_code":"DO","country_name":"DOMINICAN REPUBLIC","destination_id":2816,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"1X80XWcBn2nzTu98OMZl","_score":71.4391,"_source":{"_index":"destinations","_type":"doc","_id":"8H80XWcBn2nzTu98OMZl","_score":52.018818,"_source":{"destination_name_en":"Rome","destination_name_es":"Roma","destination_name_pt":"Roma","country_code":"IT","country_name":"ITALY","destination_id":6338,"popularity":"0"}}]

搜索:

'settings' => [ 
                'analysis' => [     
                    'filter' => [
                        'autocomplete_filter' => [
                            "type"=> "edge_ngram",
                            "min_gram"=> 1,
                            "max_gram"=> 20,

                        ]
                    ],
                    'analyzer' => [
                        'autocomplete' => [
                            "type" => "custom",
                            'tokenizer' => "standard",
                            'filter' => ['lowercase', 'asciifolding', 'autocomplete_filter'],
                        ]
                    ],

                ],   
            ],
            'mappings' =>[
                'doc' => [
                    "properties"=> [
                        "destination_name_en"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",

                        ],
                        "destination_name_es"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",
                        ],
                        "destination_name_pt"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",
                        ],
                        "popularity"=> [
                           "type"=> "integer",
                        ]
                    ]
                ]
            ]

此外,我想使用她的人气值来增加特定目的地的吸引力。

我希望有人可以向我提供示例或前进方向的指导。

我会很感激

2 个答案:

答案 0 :(得分:1)

问题在于,当您搜索roma时,Iasi-East Romania是第一个结果,因为它包含所有语言的罗马字母。但是roma仅与ES / PT / IT中的Rome匹配,而与EN不匹配。

因此,如果您想增强精确匹配,则需要在另一个字段中索引城市名称而无需自动填充(适用于所有语言),并在这些字段的should中添加新的子句。

映射示例:

 "properties"=> [
        "destination_name_en"=> [
                "type"=> "text",
                "analyzer"=> "autocomplete",
                "search_analyzer"=> "standard",
                "fields": => [
                    "exact" => [
                        "type"=> "text",
                        "analyzer"=> "standard", // you could use a more fancy analyzer here
                    ]

                ]
        ],
....

,并在查询中:

'query' => [
                "bool" => [
                    "should" => [   
                         [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "type"=>"most_fields",
                                "boost" => 2
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "fuzziness" => "1",
                                "prefix_length"=> 2                                   
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "type"=>"most_fields" 
                                "fields"=>[
                                   "destination_name_*.exact"
                                ],
                                "boost" => 2 
                            ]
                        ]
                    ]
                ]
            ]

您能尝试类似的方法并保持我们的发布状态吗?

答案 1 :(得分:0)

这件作品令人着迷!现在,我可以获得第一个结果中的罗马字,并且在单词结尾处也接受错误。罗米在第一个结果中也返回罗马。

现在,我正在尝试通过受欢迎程度提高结果(我有两个罗马,罗马-意大利和罗马-澳大利亚),而且我想在世界上增加一些受欢迎的城市。

我正在使用功能评分,但这使我感到非常奇怪。

这是我当前的代码:

'query' => [
                'function_score' => [
                    'field_value_factor' => [
                        'field' => 'popularity',
                    ],
                    "score_mode" => "multiply",
                    'query' => [
                        "bool" => [
                            "should" => [   
                                 [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "type"=>"most_fields",
                                        "boost" => 2
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "fuzziness" => "1",
                                        "prefix_length"=> 2                                   
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*.exact"
                                        ],
                                        "boost" => 2                                   
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
            ],

有没有建议?

PD:非常感谢您的帮助。从现在开始,我给您最好的答案,因为您已经解决了主要问题