在分析的字段中搜索时获取匹配的关键字

时间:2013-12-26 06:19:54

标签: elasticsearch

在搜索分析的字段时,是否有办法只获取匹配的关键字。我的情况是我有一个'content'字段(字符串已分析),查询运行如下:

GET /posts/post/_search?pretty=true
{
    "query": {
        "query_string": {
            "query": "content:(obama or hilary)"
        }
    },
    "fields": ["id", "interaction_id", "sentiment", "tweet_created_at", "content"]
}

我得到这样的输出:

"hits": [
         {
            "_index": "posts_v1",
            "_type": "post",
            "_id": "51764639fdccca097f03d095",
            "_score": 2.024847,
            "fields": {
               "content": "UGANDA HILARY",
               "id": "51764639fdccca097f03d095",
               "sentiment": 0,
               "tweet_created_at": "2012-11-24T14:59:25Z",
               "interaction_id": "1e236478961ca480e0744001f05ca8b8"
            }
         },
         {
            "_index": "posts_v1",
            "_type": "post",
            "_id": "51c2bae26c8f1806cb000001",
            "_score": 1.9791828,
            "fields": {
               "content": "Obama in Berlin — looking back",
               "id": "51c2bae26c8f1806cb000001",
               "sentiment": 0,
               "tweet_created_at": "2013-06-20T08:18:39Z",
               "interaction_id": "1e2d98202c55a980e07493a024172cb6"
            }
         },
         {
            "_index": "posts_v1",
            "_type": "post",
            "_id": "51c3a6b06c8f185fcb000001",
            "_score": 1.7071226,
            "fields": {
               "content": "Knowing Barack Obama, Hilary Clintonr",
               "id": "51c3a6b06c8f185fcb000001",
               "sentiment": 0,
               "tweet_created_at": "2013-06-21T01:04:45Z",
               "interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72"
            }
         }
]

所以,我需要这样的东西:

"hits": [
         {
            "_index": "posts_v1",
            "_type": "post",
            "_id": "51764639fdccca097f03d095",
            "_score": 2.024847,
            "fields": {
               "content": "UGANDA HILARY",
               "id": "51764639fdccca097f03d095",
               "sentiment": 0,
               "tweet_created_at": "2012-11-24T14:59:25Z",
               "interaction_id": "1e236478961ca480e0744001f05ca8b8",
               "content_tags": ["hilary"]
            }
         },
         {
            "_index": "posts_v1",
            "_type": "post",
            "_id": "51c2bae26c8f1806cb000001",
            "_score": 1.9791828,
            "fields": {
               "content": "Obama in Berlin — looking back",
               "id": "51c2bae26c8f1806cb000001",
               "sentiment": 0,
               "tweet_created_at": "2013-06-20T08:18:39Z",
               "interaction_id": "1e2d98202c55a980e07493a024172cb6",
               "content_tags": ["obama"]
            }
         },
         {
            "_index": "posts_v1",
            "_type": "post",
            "_id": "51c3a6b06c8f185fcb000001",
            "_score": 1.7071226,
            "fields": {
               "content": "Knowing Barack Obama, Hilary Clintonr",
               "id": "51c3a6b06c8f185fcb000001",
               "sentiment": 0,
               "tweet_created_at": "2013-06-21T01:04:45Z",
               "interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72",
               "content_tags": ["obama", "hilary"]
            }
         }
]

请注意第二个匹配结构中的content_tags字段。有没有办法实现这个目标?

1 个答案:

答案 0 :(得分:1)

Elasticsearch不支持返回哪个术语直接匹配哪个字段,但我认为它可以合理地轻松实现一个作为额外的“荧光笔”。我认为你现在有两种选择:

  1. 通过突出显示来做一些hacky,例如要求文本长度为max(all_strings.map(strlen).max,min_highlight_length),去除未突出显示的文本和重复数据删除。我相信min_highlight_length是13个字符或者其他东西。这可能只适用于我不建议你使用的FVH,所以也许你可以忽略它。

  2. 通过multisearch或顺序进行两次搜索。

相关问题