多个单词匹配(全文)在Elasticsearch中的单个或多个文档中

时间:2015-06-19 06:22:47

标签: search solr elasticsearch lucene full-text-search

我的要求是:

如果我将多个单词作为列表传递给搜索,ES将返回带有单词匹配子集的文档以及匹配的单词。所以我可以理解哪个文档匹配哪个子集。

假设我需要搜索足球,板球,网球,高尔夫等单词。 在三个文件中

我将这些文件存储在相应的文档中。 “mydocuments”索引的映射如下所示:

{
  "mydocuments" : {
    "mappings" : {
      "docs" : {
        "properties" : {
          "file_content" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

第一份文件

{ _id: 1, file_content: "I love tennis and cricket"}

第二份文件:

{ _id: 2, file_content: "tennis and football are very popular"}

第三份文件:

{ _id: 3, file_content: "football and cricket are originated in england"}
  

我应该可以搜索单个文件/或多个文件,用于足球,网球,   板球,高尔夫,它应该返回这样的东西

像这样的东西

    "hits":{
        "total" : 3,
        "hits" : [
            {
                "_index" : "twitter",
                "_type" : "tweet",
                "_id" : "1",
                "_source" : {
                    "file_content" : ["football","cricket"],
                    "postDate" : "2009-11-15T14:12:12",

                }
                },
                {
                    "_index" : "twitter",
                    "_type" : "tweet",
                    "_id" : "2",
                    "_source" : {
                        "file_content" : ["football","tennis"],
                        "postDate" : "2009-11-15T14:12:12",

                    }
                }
            ]

或者在多个文件搜索的情况下,上面的搜索结果数组

任何想法我们如何使用Elasticsearch做到这一点?

如果使用elasticsearch无法做到这一点,我准备评估任何其他选项(Native lucene,Solr)

修改

我的不好可能是我没有提供足够的细节。 @Andrew我所说的文件是ES中文档中存储为字符串字段(全文)的文件的文本内容。假设一个文件对应于一个名为“file_content”的字段中包含文本内容字符串的文档。

1 个答案:

答案 0 :(得分:1)

你最接近你想要的是highlighting,意思是强调文件中搜索的术语。

示例查询:

{
  "query": {
    "match": {
      "file_content": "football tennis cricket golf"
    }
  },
  "highlight": {
    "fields": {"file_content":{}}
  }
}

结果:


       "hits": {
          "total": 3,
          "max_score": 0.027847305,
          "hits": [
             {
                "_index": "test_highlight",
                "_type": "docs",
                "_id": "1",
                "_score": 0.027847305,
                "_source": {
                   "file_content": "I love tennis and cricket"
                },
                "highlight": {
                   "file_content": [
                      "I love <em>tennis</em> and <em>cricket</em>"
                   ]
                }
             },
             {
                "_index": "test_highlight",
                "_type": "docs",
                "_id": "2",
                "_score": 0.023869118,
                "_source": {
                   "file_content": "tennis and football are very popular"
                },
                "highlight": {
                   "file_content": [
                      "<em>tennis</em> and <em>football</em> are very popular"
                   ]
                }
             },
             {
                "_index": "test_highlight",
                "_type": "docs",
                "_id": "3",
                "_score": 0.023869118,
                "_source": {
                   "file_content": "football and cricket are originated in england"
                },
                "highlight": {
                   "file_content": [
                      "<em>football</em> and <em>cricket</em> are originated in england"
                   ]
                }
             }
          ]
       }

正如您所看到的,在特殊的<em>部分下突出显示了找到的字词(highlight标记所包围的元素)。