用于在ElasticSearch中搜索短语的分析器

时间:2015-05-26 05:35:33

标签: java elasticsearch lucene analyzer query-analyzer

我正在使用ElasticSearch 1.5.2。 我想允许在我的搜索引擎中搜索短语。

假设文字是

read with section 114 of the Indian Penal Code

使用默认分析器我无法在搜索查询中获得任何结果

section 114 penal code

所以,我添加了一个分析器:

        XContentBuilder settingsBuilder = XContentFactory.jsonBuilder()
            .startObject()
                .startObject("analysis")
                    .startObject("filter")
                        .startObject("filter_shingle")
                            .field("type","shingle")
                            .field("max_shingle_size",2)
                            .field("min_shingle_size",2)
                            .field("output_unigrams",false)
                        .endObject()
                        .startObject("filter_stemmer")
                            .field("type","porter_stem")
                            .field("language","English")
                        .endObject()
                    .endObject()
                    .startObject("tokenizer")
                        .startObject("my_ngram_tokenizer")
                            .field("type","nGram")
                            .field("min_gram",1)
                            .field("max_gram",1)
                        .endObject()
                    .endObject()
                    .startObject("analyzer")
                        .startObject("ShingleAnalyzer")
                            .field("tokenizer","my_ngram_tokenizer")
                            .array("filter","snowball","standard","lowercase","filter_stemmer","filter_shingle")
                        .endObject()
                    .endObject()
                .endObject()
            .endObject();

    client.admin().indices()
    .prepareCreate("temp_index").setSettings(settingsBuilder).get();

我正在索引文件(已经是可接受的json格式),如下所示:

String file1 = readFile("1.txt");
IndexResponse response1 = client.prepareIndex("new_index","docs").setSource(file1).execute().actionGet();

并使用matchQuery进行查询,如下所示:

MatchQueryBuilder mqb1 = QueryBuilders.matchPhraseQuery("text", str).analyzer("ShingleAnalyzer");
SearchResponse matchResponse1 = client.prepareSearch().setQuery(mqb1).execute().actionGet();

但我仍然没有结果。 你能告诉我该怎么办?

编辑: 其实, 当我尝试从这个分析器中获取任何类型的结果时,我得不到命中...即使有一个查询"部分",它存在于我索引的所有文档中,我得不到结果,当我使用默认分析器进行搜索时,我得到了一些结果。那么,这个分析仪不工作还是什么?

编辑: 样本文件,

{
      "docName": "Adamji Umar Dalal vs The State Of Bombay",
      "text": "1.These two appeals by special leave are limited to the question of sentence only. In case No. 1783/P of 1950, which has given rise to Criminal Appeal No. 54 of 1951, the appellant Adamji Umar Dalal was tried along with five other persons on the following charges :- Firstly,that you at Bombay on or about the 29th day of December, 1949, in contravention of Government Notification No. 342/IV B, dated 27-1-46 issued under the Essential Supplies (Temporary Powers) Act, 1946, attempted to export by rail out of the State Of Bombay to Jalna, a place beyond the limits of Bombay State, 50 barrels of kerosene oil, without having any permit in that behalf, by misdescribing or causing the misdescription of the said barrels of oil as high speed diesel oil, and thereby committed an offence punishable under sections 7 and 8 of the Essential Supplies (Temporary Powers) Act. 2.Secondly, that you at Bombay, on or about the 29th day of December, 1949, attempted to export by rail 50 barrels of Kerosene oil by misdescribing or causing the misdescription of the same as high speed diesel oil, and abetted each other in the commission of the said offence and thereby committed an offence punishable under section 106 and 107 of the Indian Railway Act, read with section 114 of the Indian Penal Code. 3.In Cases Nos. 1784/P and 1785/P of 1950 the appellant was tired along with the same persons on similar charges in respect of two other lots of 50 and 15 barrels of Kerosene oil respectively. These two cases have given rise to Appeal No. 55 of 1951. 4.The circumstances under which these three cases arose are these. On the 29th December, 1949, three consignments of 50, 50 and 15 barrels had been booked from Wadi Bundar under the description of high speed diesel oil when in fact they contained kerosene oil and were to be despatched to Jalna. The police on getting information of this fact opened the railway wagons and took charge of the barrels kept in them. Accused 2,3 and 4 are members of a firm of commission agents. They had purchased the barrels of oil from Sunbeam Oil Company on behalf of three different principals. The first accused is a representative of one of these firms. Accused 5 and 6 are the godown keeper and the assistant godown keeper of the supplier company. All the barrels seized bore the mark Prakash Trades High Speed Diesel Oil, U. S. A. The third accused engaged two lorries to remove 100 barrels and they were loaded in the lorries and delivered to Sattar Latif, witness, who was the forwarding and carting agent at Wadi Bundar. He was instructed by the third accused for the booking of these barrels for Jalna in Hyderabad State, along with the third lot of 15 barrels. In the consignment note which concerned the 50 barrels purchased on behalf of the first accused his firm was shown as the consignor and the consignee was self. The consignment note was signed by Sattar Latif. In these documents the goods were described as high speed diesel oil. Similar consignment notes and risk notes were prepared in respect of the other two consignments. There was a ban on the export of kerosene oil to any place outside the State of Bombay. All the barrels had a white paint on them. It appeared to be new and below the paint on the barrels the words Kerosene oil was visible. On these facts the prosecution started three separate cases in respect of the three consignments of 50,50 and 15 barrels respectively on the charges set out above against all the six accused persons. All of them pleaded not guilty. 5.The fifth accused stated that accused 2 and 3 brought to him a delivery order asking him to delivery order asking him to deliver high speed diesel oil but that he delivered to them Kerosene oil at their request. The first accused admitted that he on behalf of his firm placed an order for 65 barrels of high speed diesel oil through the second accused but denied all knowledge about the alleged delivery of kerosene oil. The second accused said that he placed an order for diesel oil with Sunbeam Oil Company for 65 barrels and obtained a delivery order from the company and gave it to the third accused sent him to take delivery of the barrels from the godown of the company. He denied having told the fifth accused to deliver kerosene oil instead of diesel oil. The third accused admitted having taken delivery of the barrels on the instructions of the second accused and having sent them to Wadi Bundar in two lorries. He was surprised to learn that the barrels contained Kerosene oil. He denied that he ever asked the company to deliver kerosene oil for diesel oil. The fourth accused said that he personally took no part in the transaction and had committed no offence. The sixth accused stated that he had delivered the barrels as ordered by the fifth accused and had committed no offence. The learned Presidency Magistrate convicted accused 2,3 and 5 on the charges leveled against them and acquitted accused 1, 4 and 6 as he felt some doubt in regard to them. 6.The appellant (accused 3) in these two appeals was awarded the following sentences :- 1.In case No. 1783 P of 1950 he was sentenced to six months rigorous imprisonment and a fine of Rs. 15,000 under section 7 and 8 of the Essential Supplies (Temporary Powers) Act. For default in the payment of fine he was to undergo six months rigorous imprisonment. A fine of Rs. 1000 was awarded to him under section 106 of the Indian Railways Act and in default he was to undergo one month's imprisonment. 2.In Case No. 1784-P of 1950, under section 7 and 8 of the Essential Supplies (Temporary Powers) Act he was awarded rigorous imprisonment for six months and a fine of Rs. 15,000 and in default six months rigorous imprisonment. Under the Railways Act he was fined in the sum of Rs. 1000 and in default he was ordered to undergo one month's imprisonment. 3.In Case No. 1785-P of 1950, under section 7 and 8 of the Essential Supplies (Temporary Powers) Act he was awarded a sentence of one days imprisonment and a fine of Rs. 10,000 and in default rigorous imprisonment for six months. Under the Railways Act he was fined in the sum of Rs. 300 and in default he was ordered to undergo one month's imprisonment. In the result in respect of these 115 barrels of oil a cumulative fine of Rs. 42,300 was imposed on the appellant besides the sentences of imprisonment. The learned Presidency Magistrate while imposing the sentence observed as follows :- Suchblack market transactions when detected must be crushed, else the common man has no escape from the plague. 7.On appeal the convictions and sentences were maintained except that the fine imposed on the fifth accused was remitted. The High Court held that having regard to the manner in which the offence was committed and the purpose for which kerosene was attempted to be sent outside the State of Bombay which obviously was to sell it in the black market the sentences passed could not be regarded as excessive. 8.The determination of the right measure of punishment is often a point of great difficulty and no hard and fast rule can be laid down, it being a matter of discretion which is to be guided by a variety of considerations, but the courts has always to bear in mind the necessity of proportion between an offence and the penalty. In imposing a fine it is necessary to have as much regard to the pecuniary circumstances of the accused persons to the character and magnitude of the offence, and where a substantial term of imprisonment is inflicted, an excessive fine should not accompany it except in exceptional cases. It seems to us that due regard has not been paid to these consideration in these cases and the zeal to crush the evil of black marketing and free the common man from this plague has perturbed the judicial mind in the determination of the measure of punishment. 9.The appellant was acting in these transactions on behalf of the first accused and other principals in the capacity of a member of a commission agency firm. It was asserted before us that its commission in this deal was half per cent on sale price. There is no evidence on the record about the accused's pecuniary condition. His learned counsel emphatically asserted at the Bar that it was impossible for him to pay even a fraction of this heavy fine. The profit made on the sale of oil in the black market would in the ordinary course of business dealings go to the principals but its extent is not known nor found on the record. The first accused who was to profit by getting kerosene oil by this device has been acquitted and is not before us. The other persons on whose behalf the oil was purchased were not brought to trial. In these circumstances there is no material on the record justifying the imposition of such heavy fines on the appellant and these seem to us to be quite disproportionate to the offences. 10.It is no doubt true that the offence of black marketing is very generally prevalent in this country at the present moment and when it is brought home against a person, no leniency in the matter of sentence should be shown and a certain amount of severity may be very appropriate and even called for. In our opinion, however, when quite a substantial sentence of imprisonment was awarded to the appellant, a person belonging to the commission agency class, imposition of unduly heavy fines which may have been justified to some extent in the case of the principals, was not called for in his case. It is not the practice of this court to interfere by special leave in the matter of punishment imposed for crimes committed, except in exceptional cases where the sentences are unduly harsh and do not really advance the ends of justice. 12.For the reasons given above we think that it would meet the ends of justice if the fines imposed on the appellant by the Magistrate and upheld by the High Court are reduced in all cases as below :- 13.In case No. 1783-p of 1950, the sentence of fine is reduced to Rs. 1000 from Rs. 15000 and in default he will undergo imprisonment for a period of one month. 14.In case No. 1784-P of 1950, also the fine is reduced to Rs. 1000 from Rs. 15000 and in default he will undergo imprisonment for one month. 15.Similarly, in Case No. 1785-P of 1950, the sentence of fine is reduced to Rs. 1000 and in default he will undergo imprisonment for a month. 16.The fines in all the cases under the Indian Railways Act are reduced to one cumulative fine of Rs. 1000 instead of a fine of Rs. 2300 and in default he will undergo imprisonment for a month. In all other respects the appeals fail and are dismissed. 17.Sentences reduced."
    }

1 个答案:

答案 0 :(得分:4)

开头的内容如下所示。请注意,搜索方式与索引方式一样重要。您想要建立的第一件事是您的用户将作为输入文本提供的内容(自由范围输入,一个单词,他们可以指定必须存在哪些可选等)。

之后你需要确定匹配的规则:完全匹配,词组匹配,模糊匹配,你是否关心得分,或者只有匹配等等。你说a scoring mechanism which ranks results with exact match to be at the highest rank, then the non exact based matches according to their scores (say tf-idf )

这将是我的开始:

{
  "settings": {
    "analysis": {
      "filter": {
        "filter_shingle": {
          "type": "shingle",
          "max_shingle_size": 8,
          "min_shingle_size": 2,
          "output_unigrams": false
        },
        "filter_stemmer": {
          "type": "porter_stem",
          "language": "english"
        }
      },
      "analyzer": {
        "ShingleAnalyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "snowball",
            "filter_stemmer",
            "filter_shingle"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "ShingleAnalyzer",
          "fields": {
            "raw_standard_analyzer": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

一个查询,可以更多should s,具体取决于您匹配文字的规则

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "text": "section 114 penal code"
          }
        },
        {
          "match": {
            "text.raw_standard_analyzer": "section 114 penal code"
          }
        }
      ]
    }
  }
}

Java中的类似内容:

SearchResponse response = client().prepareSearch()         
        .setQuery(QueryBuilders.boolQuery()
            .should(QueryBuilders.matchQuery("text", "section 114 penal code"))
            .should(QueryBuilders.matchQuery("text.raw_standard_analyzer", "section 114 penal code")))
        .execute().actionGet();

重点是:

  • 您希望更精确的匹配得分更高:在一个字段中使用带状疱疹然后在该字段上执行match
  • 您还希望匹配常规字词,无论它们位于何处:在第二个字段中使用standard分析器,并使用should添加另一个match

然后测试,看看你得到了什么。如果您不满意并且发现了一些您想要获得更高分的文档,请查看文档,确定它不起作用,制定规则,查找ES中的功能以帮助您实现新功能规则,定义一个新字段,为该字段添加另一个should语句。