在弹性搜索中进行威尔卡搜索或部分匹配

时间:2016-12-12 03:09:18

标签: elasticsearch wildcard

我正在尝试向最终用户提供类型,因为它们更像是sqlserver。 我能够为给定的sql场景实现ES查询:

 select * from table where name like '%pete%' and type != 'xyz and type!='abc'

但ES查询对此sql查询不起作用

  select * from table where name like '%peter tom%' and type != 'xyz and type!='abc'

在弹性搜索和通配符查询中,我还需要执行一些布尔过滤查询

{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"query": {
"wildcard": {
"name":
{ "value": "*pete*" }
}
}
}
],
"must_not": [
{
"match":
{ "type": "xyz" }
}, {
"match":
{ "type": "abc" }
}
]
}
}
}
}
}

上面的带有通配符搜索的弹性查询工作正常并且获取了匹配pete的所有文档,并且不是类型为xyz和abc。但是当我尝试执行带有由空格分隔的2个单独单词的通配符时,同样的查询返回给我如下所示为空。例如

{
    "query": {
    "filtered": {
    "filter": {
    "bool": {
    "should": [
    {
    "query": {
    "wildcard": {
    "name":
    { "value": "*peter tom*" }
    }
    }
    }
    ],
    "must_not": [
    {
    "match":
    { "type": "xyz" }
    }, {
    "match":
    { "type": "abc" }
    }
    ]
    }
    }
    }
    }
    }

我的映射如下:

{
  "properties": {
     "name": {
      "type": "string"
    }
    "type": {
      "type": "string"
    }
  }
}

我应该使用什么查询才能对通过空格分隔的单词进行通配符搜索

2 个答案:

答案 0 :(得分:2)

最有效的解决方案是利用ngram tokenizer来标记name字段的部分内容。例如,如果你有一个像peter tomson这样的名字,ngram tokenizer会像这样标记化并索引它:

  • PE
  • 宠物
  • 皮特
  • 彼得
  • peter t
  • peter to
  • 彼得汤姆
  • peter toms
  • peter tomso
  • eter tomson
  • ter tomson
  • er tomson
  • r tomson
  • 汤臣
  • 汤臣
  • omson
  • MSON

因此,在对此编制索引后,搜索其中任何一个令牌都会检索包含peter thomson的文档。

让我们创建索引:

PUT likequery
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_ngram_analyzer": {
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "15"
        }
      }
    }
  },
  "mappings": {
    "typename": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "search": {
              "type": "string",
              "analyzer": "my_ngram_analyzer"
            }
          }
        },
        "type": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

然后,您就可以通过简单而高效的term查询进行搜索:

POST likequery/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "name.search": "peter tom"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "type": "xyz"
          }
        },
        {
          "match": {
            "type": "abc"
          }
        }
      ]
    }
  }
}

答案 1 :(得分:1)

我的解决方案并不完美,我不确定性能。所以你应该自担风险:)

这是第5版

PUT likequery
{
  "mappings": {
    "typename": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        },
        "type": {
          "type": "string"
        }
      }
    }
  }
}
ES 2.1中的

“type”:“keyword”更改为“type”:“string”,“index”:“not_analyzed”

PUT likequery/typename/1
{
  "name": "peter tomson"
}

PUT likequery/typename/2
{
  "name": "igor tkachenko"
}

PUT likequery/typename/3
{
  "name": "taras shevchenko"
}

查询大小写

POST likequery/_search
{
  "query": {
    "regexp": {
      "name.raw": ".*taras shev.*"
    }
  }
}

回复

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "likequery",
        "_type": "typename",
        "_id": "3",
        "_score": 1,
        "fields": {
          "raw": [
            "taras shevchenko"
          ]
        }
      }
    ]
  }
}

PS。我再一次不确定这个查询的性能,因为它将使用scan而不是index。