Solr架构。精确重音匹配和重音不匹配的匹配

时间:2017-11-24 19:14:35

标签: search solr lucene accent-insensitive accent-sensitive

我正在尝试弄清楚如何配置Solr manage-schema的fieldType以实现以下目标:
(a)当搜索非重音字符串时,结果将是重音不敏感的 (b) HOWEVER 在对重音字符串执行搜索时,结果具有重音敏感性。

例如:
searchString - > expectedResult
Equipe - > Equipe,Equipé,Equípé等......

Equipé - >队报

注意:通配符(*)无关紧要,选择的单词仅用于演示目的。

由于某些要求限制,我的情况有点不常见但是我的架构(下图),我有3个字段; OName,OSearch,ONameSearch。 (注意:OSearch和ONameSearch在后端提供不同的用途,因此需要对它们进行缩进定义) 我的意图是让我的Solr在OSearch和ONameSearch上查询,并将OName返回给UI。

我最初的理解是,OName将存储原始值(“María”)并将其索引为不区分重音(“maria”),这样当没有solr.ASCIIFoldingFilterFactory的查询时,将实现以下目标。

示例:{query} - > {OName =结果}
q = OSearch:*equipe* OR ONameSearch:*equipe* - > OName = Equipe,Equipé,Equípé等 q = OSearch:*equipé* OR ONameSearch:*equipé* - > OName =Equipé

到目前为止,这是我的架构......

<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
  </fieldType>

<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer>
<analyzer type="index">
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
  <filter class="solr.EnglishMinimalStemFilterFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
  <filter class="solr.EnglishMinimalStemFilterFactory"/>
  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<field name="OName"                   type="lowercase"          indexed="true"      stored="true" />
<field name="OSearch"                 type="text_en_splitting_tight"  indexed="true"      stored="false" multiValued="true" />
<field name="ONameSearch"             type="text_en_splitting_tight"  indexed="true"      stored="false" multiValued="true" />

<copyField source="OName"          dest="OSearch" />
<copyField source="OName"          dest="ONameSearch" />

请指教,谢谢!

我调查的大部分(如果不是全部)相关资源
How to ignore accent search in Solr
How to ignore accents in SOLR search?
SOLR and accented characters
Solr accent removal
SOLR Makes Search with Accented Characters Easy
Solr Ref Guide 6.6 Defining Fields
Solr Ref Guide 6.6 Copying Fields

0 个答案:

没有答案