Solr Query Exact Match and partial match search Querying

时间:2016-10-20 19:32:53

标签: apache search solr solr4

I have to search some documents with exact match and partial matches. For an example : i have documents with title called "ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE". I want to search ABC-01 With high score (Exact match the search term in the title) Also I want to search documents contains ABC-01. Also It should sorted according to the score and date in desc order . Also there is an another field called driver . search also should search the driver field with lower score than title exact match or parital match.

(Please Note Exact match search only "ABC-01" Not "ABC-010") Any Clues on this ?

  • id:ABC-01
  • Title :ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE
  • joinedDate:2016-01-10

  • id:ABC-010
  • Title :ABC-001010 IS AVAILABLE
  • joinedDate:2016-01-12
  • Driver:ABCMAN


  • id:XYZ-05

  • Title :XYZ-05 CAB IS IS AVAILABLE,ABC-01-XE IS AVAILABLE
  • joinedDate:2015-01-12 Driver:ABCD MAN ABC-01

  • id:ABC-07
  • Title :ABC-07 IS AVAILABLE ABC-01-XE
  • joinedDate:2015-01-12
  • Driver:CD MAN ABC-05

For this example if i search ABC-01

- I wanted the follwoing Resuls

  • id:ABC-01
  • Title :ABC-01 IS AVAILABLE
  • joinedDate:2016-08-12
  • Driver:ABCMAN

  • id:XYZ-05
  • Title :XYZ-05 CAB IS IS AVAILABLE,ABC-07 IS AVAILABLE
  • joinedDate:2015-01-12
  • Driver:ABCD MAN ABC-01

  • id:ABC-07
  • Title :ABC-07 IS AVAILABLE ABC-01-XE
  • joinedDate:2015-01-11
  • Driver:CD MAN ABC-05

Please If The search term is available as exact match in title it should highly scored. OR If not it should search title field contains ABC-01 or abc-01-xe or whatever contains abc-01. Also It should search driver field to find any related driver for the given term.

Results should be sorted according to score as well as date. Also Exact match recent date should be displayed first with the order.

2 个答案:

答案 0 :(得分:1)

Edited response: As Alexandre pointed out, you assign weight with edismax. For the sake if fun, if you add the sample data at the bottom to a test core and run the following search it gives you the right order of cabs.

http://.us-west-2.compute.amazonaws.com:8983/solr/abc123/select?defType=edismax&indent=on&q=id:ABC-01*%20ORTitle:ABC-01&qf=id^1.5%20Title^0.7&wt=json

In the regular query, you have a plain vanilla wild-card search with an OR:

id:ABC-01* 
OR
Title:*ABC-01*

enter image description here Then you enable edismax and assign weights, I pumped up id by 1.5 and reduced Title to 0.7 as in:

id^1.5 Title^0.7

enter image description here

The response is as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":23,
    "params":{
      "q":"id:ABC-01* \nOR\nTitle:*ABC-01*",
      "defType":"edismax",
      "indent":"on",
      "qf":"id^1.5 Title^0.7",
      "wt":"json",
      "_":"1477029831405"}},
  "response":{"numFound":13,"start":0,"docs":[
      {
        "id":"ABC-01",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-10T00:00:00Z"],
        "_version_":1548778151323107328},
      {
        "id":"ABC-010",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-14T00:00:00Z"],
        "_version_":1548778151552745472},
      {
        "id":"ABC-01234",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-14T00:00:00Z"],
        "_version_":1548778803999801344},
      {
        "id":"ABC-02",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-11T00:00:00Z"],
        "_version_":1548778151538065408},
      {
        "id":"ABC-03",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-12T00:00:00Z"],
        "_version_":1548778151548551168},
      {
        "id":"ABC-04",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-13T00:00:00Z"],
        "_version_":1548778151549599744},
      {
        "id":"XYZ-04",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE"],
        "joinedDate":["2016-01-13T00:00:00Z"],
        "_version_":1548778151556939776},
      {
        "id":"ABC-07",
        "Title":["ABC-07 IS AVAILABLE ABC-01-XE"],
        "joinedDate":["2015-01-12T00:00:00Z"],
        "_version_":1548778495705874432},
      {
        "id":"BBC-02",
        "Title":["ABC-01 CAB IS BUSY RIGHT NOW. "],
        "joinedDate":["2016-01-11T00:00:00Z"],
        "_version_":1548778803994558464},
      {
        "id":"ABC-010101",
        "Title":["ABC-02 CAB IS BUSY RIGHT NOW. ABC01 CAB IS AVAILABLE"],
        "joinedDate":["2016-01-12T00:00:00Z"],
        "_version_":1548778803995607040}]
  }}

SAMPLE DATA to add:

 <add><doc>
<field name="id">ABC-01</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-10</field>
</doc>
<doc>
<field name="id">ABC-02</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-11</field>
</doc>
<doc>
<field name="id">ABC-03</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-12</field>
</doc>
<doc>
<field name="id">ABC-04</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-13</field>
</doc>
<doc>
<field name="id">ABC-010</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-14</field>
</doc>
<doc>
<field name="id">ABC-07</field>
<field name="Title">ABC-07 IS AVAILABLE ABC-01-XE</field>
<field name="joinedDate">2015-01-12</field>
</doc>

<doc>
<field name="id">XYZ-04</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-13</field>
</doc>
<doc>
<field name="id">DBC-01</field>
<field name="Title">DBC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-10</field>
</doc>
<doc>
<field name="id">BBC-02</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. </field>
<field name="joinedDate">2016-01-11</field>
</doc>
<doc>
<field name="id">ABC-010101</field>
<field name="Title">ABC-02 CAB IS BUSY RIGHT NOW. ABC01 CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-12</field>
</doc>
<doc>
<field name="id">ABC-01QWERTY</field>
<field name="Title">CAB IS BUSY RIGHT NOW. </field>
<field name="joinedDate">2016-01-13</field>
</doc>
<doc>
<field name="id">ABC-01234</field>
<field name="Title">ABC-01 CAB IS BUSY RIGHT NOW. ABCDE CAB IS AVAILABLE</field>
<field name="joinedDate">2016-01-14</field>
</doc>
<doc>
<field name="id">ABC-007</field>
<field name="Title">ABC-007 IS AVAILABLE ABC-01-XE</field>
<field name="joinedDate">2015-01-12</field>
</doc>
<doc>
<field name="id">XYZ-014</field>
<field name="Title"> ABCDE CAB IS AVAILABLE. ABC-01 CAB IS BUSY RIGHT NOW.</field>
<field name="joinedDate">2016-01-13</field>
</doc></add>

ORIGINAL RESPONSE: You are probably looking to have something along the lines of:

id:ABC-01* OR id:*ABC

The query in the URL would look like this:

http:<server>:8983/solr/<core>/select?indent=on&q=id:ABC-01*%20OR%20id:*ABC&wt=json 

答案 1 :(得分:1)

这里有几个问题。

您可以使用eDisMax搜索多个字段,并为不同的字段赋予不同的权重以进行排序。

您可以按照功能查询进行排序,这些功能查询会将得分,日期和实验混合起来,直到您获得正确的混音。

将ABC-01-xe与ABC-01相匹配有点困难,因为不清楚你的意思。它将是某种索引时间分析器链元素,但哪一个取决于您的映射的具体情况。是ABC-01-ANYTHING映射到ABC-01,还是必须是ABC-01-xe。怎么样ABC-01234?您需要首先获得此映射的业务规则,然后确保 - 在索引时间分析器链的末尾 - 您得到了您想要的内容。您可能还希望两个字段具有相同的信息处理方式,另一个处理较少(例如ABC-01精确)具有较高的权重。