查询嵌套文档中缺少的字段

时间:2013-05-06 13:07:53

标签: lucene elasticsearch

我有一个包含许多标签的用户文档 这是映射:

{
  "user" : {
    "properties" : {
      "tags" : {
        "type" : "nested",
        "properties" : {
          "id" : {
            "type" : "string",
            "index" : "not_analyzed",
            "store" : "yes"
          },
          "current" : {
            "type" : "boolean"
          },
          "type" : {
            "type" : "string"
          },
          "value" : {
            "type" : "multi_field",
            "fields" : {
              "value" : {
                "type" : "string",
                "analyzer" : "name_analyzer"
              },
              "value_untouched" : {
                "type" : "string",
                "index" : "not_analyzed",
                "include_in_all" : false
              }
            }
          }
        }
      }
    }
  }
}

以下是示例用户文档:
用户1

{
  "created_at": 1317484762000,
  "updated_at": 1367040856000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "company",
      "value": "alma connect",
      "id": "58ad4afcc8415216ea451339aaecf311ed40e132"
    },
    {
      "type": "company",
      "value": "Google",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5",
      "current": true
    },
    {
      "type": "discipline",
      "value": "B.Tech.",
      "id": "a7706af7f1477cbb1ac0ceb0e8531de8da4ef1eb",
      "institute_id": "4fb424a5addf32296f00013a"
    },    
  ]
}

用户2:

{
  "created_at": 1318513355000,
  "updated_at": 1364888695000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "college",
      "value": "Bharatiya Vidya Bhavan's Public School, Jubilee hills, Hyderabad",
      "id": "d20730345465a974dc61f2132eb72b04e2f5330c"
    },
    {
      "type": "company",
      "value": "Alma Connect",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a4"
    }    
  ]
}

用户3:

{
  "created_at": 1318513355001,
  "updated_at": 1364888695010,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361821"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a1"
    }    
  ]
}

使用上述ES文档进行搜索,我想构建一个查询,我需要在嵌套标记文档中获取具有公司标记的用户或者没有任何公司标记的用户。我的搜索查询是什么?

例如,在上述情况下,如果搜索google标记,则返回的文档应为“用户1”和“用户3”(因为用户1的公司标记为google,而用户3没有公司标记)。用户2不会被退回,因为它也有谷歌以外的公司标签。

1 个答案:

答案 0 :(得分:3)

根本不重要,主要是因为没有类型:公司标签条款。这就是我想出的:

{
  "or" : {
    "filters" : [ {
      "nested" : {
        "filter" : {
          "and" : {
            "filters" : [ {
              "term" : {
                "tags.value" : "google"
              }
            }, {
              "term" : {
                "tags.type" : "company"
              }
            } ]
          }
        },
        "path" : "tags"
      }
    }, {
      "not" : {
        "filter" : {
          "nested" : {
            "filter" : {
              "term" : {
                "tags.type" : "company"
              }
            },
            "path" : "tags"
          }
        }
      }
    } ]
  }
}

它包含一个带有两个嵌套子句的or filter:第一个找到包含tags.type:company和tags.value:google的文档,而第二个找到没有任何文档的文档tags.type:公司

这需要进行优化,因为和/或/不过滤器不会利用缓存来处理与位集一起使用的过滤器,例如term filter。最好花一些时间来找到使用bool filter的方法并获得相同的结果。看看this article了解更多信息。