思考狮身人面像模糊搜索?

时间:2011-05-19 09:51:07

标签: sphinx thinking-sphinx

我正在我的rails应用程序中实现sphinx搜索 我想用模糊搜索。它应该搜索拼写错误,例如,如果输入搜索查询charact * a * ristics,它应该搜索charact * e * ristics。

我应该如何实现这个

3 个答案:

答案 0 :(得分:6)

Sphinx自然不允许拼写错误 - 它不关心单词拼写是否正确,它只是索引它们并匹配它们。

有两种选择 - 使用thinking-sphinx-raspell来捕捉用户搜索时的拼写错误,并为他们提供再次使用改进查询进行搜索的选择(就像Google一样);或者可以使用soundex或metaphone形态,因此单词的索引方式可以解释它们的声音。在this page上搜索词干,你会找到相关的部分。此外,还可以阅读Sphinx's documentation

我不知道这两个选项有多可靠 - 个人而言,我选择#1。

答案 1 :(得分:3)

  

默认情况下,Sphinx不会注意使用星号字符进行通配符搜索。不过你可以打开它:

development:
  enable_star: true
  # ... repeat for other environments

请参阅http://pat.github.io/thinking-sphinx/advanced_config.html 通配符/星形语法部分。

答案 2 :(得分:2)

是的,Sphinx通常使用扩展匹配模式。

有以下匹配模式:

SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”);
SPH_MATCH_EXTENDED2, an alias for SPH_MATCH_EXTENDED;
SPH_MATCH_FULLSCAN, matches query, forcibly using the "full scan" mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching.

SPH_MATCH_EXTENDED2在0.9.8和0.9.9开发周期中使用,当时内部匹配引擎被重写(为了附加功能和更好的性能)。通过0.9.9版本,旧版本已删除,SPH_MATCH_EXTENDED和SPH_MATCH_EXTENDED2现在只是别名。

<强> enable_star

  

在搜索前缀/中缀索引时启用星型语法(或通配符语法)。 &gt;可选,默认值为0(不使用通配符语法),以与0.9.7兼容。 &gt;已知值为0和1.

例如,假设索引是使用中缀构建的,并且enable_star是1.搜索应该如下工作:

"abcdef" query will match only those documents that contain the exact "abcdef" word in them.
"abc*" query will match those documents that contain any words starting with "abc" (including the documents which contain the exact "abc" word only);
"*cde*" query will match those documents that contain any words which have "cde" characters in any part of the word (including the documents which contain the exact "cde" word only).
"*def" query will match those documents that contain any words ending with "def" (including the documents that contain the exact "def" word only).
  

示例:

     

enable_star = 1