MapReduceIndexerTool没有正确重新索引文档

时间:2015-07-14 19:02:13

标签: hadoop solr solrcloud cloudera-cdh morphline

我目前正在尝试使用Cloudera快速入门vm上的Cloudera Search批量索引开发,在文本文件中批量索引数据。我相信我的架构和morphline有问题,因为它完成了这项工作,并且在我进入Solr仪表板时,当它已编制索引但没有文档时,它似乎正在工作。核心显示但它只是零文件。我确信我正在运行的命令和cloudera搜索工作之前它允许我批量索引我使用示例输入文件,模式和morphline文件的示例,它应该工作,索引并将文档添加到核心。我用来执行此操作的命令是:

hadoop --config /etc/hadoop/conf.cloudera.yarn jar  \
/usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool -D \
'mapred.child.java.opts=-Xmx500m'  \
--log4j '/usr/share/doc/search-1.0.0+cdh5.4.0+0/examples/solr-nrt/log4j.properties' \
--morphline-file /usr/share/doc/search-1.0.0+cdh5.4.0+0/examples/solr-nrt/test-morphlines/readMultiLine.conf \
--output-dir hdfs://quickstart.cloudera:8020/user/outdir --verbose --go-live \
--zk-host 127.0.0.1:2181/solr --collection collection1 \
hdfs://quickstart.cloudera:8020/user/indir

我的架构是:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="sentences" version="1.5">         
 <fields>                   
   <field name="id" type="text_general" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="sentence" type="text_general" indexed="true" stored="false"/>
   <field name="_version_" type="long" indexed="true" stored="true"/>
   <dynamicField name="ignored_*" type="ignored"/>       
 </fields>    

 <uniqueKey>id</uniqueKey>

 <types>        
      <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
      <fieldType name="random" class="solr.RandomSortField" indexed="true" />
      <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
      <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>    
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>    
 </types>   
</schema>

对于我的morphline文件,我使用的是我在示例中找到的只读单行的文件:

morphlines : [
  {
    id : morphline1
    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands : [                    
      { 
        readLine {
          ignoreFirstLine : true
          commentPrefix : "#"
          charset : UTF-8
        }
      } 
      { logDebug { format : "output record: {}", args : ["@{}"] } }    
    ]
  }
]

我的示例输入是:(DocID标签句子)

1   For evening wear at the North Pole, girls could dress up in handsome Nordic sweaters and full iridescent taffeta skirts, or top one of the full striped skirts with a terrific short beige trench coat.    
2   But working to change the communist-run system is illegal, and the party relentlessly punishes dissent.    
3   Word of the latest document first came on Sept. 1, 1987, during a meeting between the pope and Jewish leaders in Castel Gandolfo, the pontiff's summer residence in the hills southeast of Rome.    
4   Anita Moen-Guidon of Norway was third, 2:28.6 behind Lazutina, and Russia's Julia Chepalova fourth, 2:53.5 behind.    
5   We have been beaten, we have shed blood, we have purchased the right to meet here today with our blood,'' said John Munuve, an assembly leader.
6   The folklore Nordic knits were handsome, in sweaters, or knee-length pants, and might have been topped by something like a super taffeta full coat.   
7   Several politicians have charged that the high taxes Kenyans already pay go into the pockets of government officials or wasteful projects, and not into providing essential services and repairing crumbling infrastructure.   
8   independence.

1 个答案:

答案 0 :(得分:0)

在您的schema.xml中,您将class Test { float playTimer = 0; void Update() { Console.WriteLine(playTimer); } } 作为必填字段。但是,readLine仅将行读入“消息”字段。

因此,您需要向文档中添加id。您可以使用setValues之类的东西,也可以使用制表符分隔符和列名将readLine更改为readCSV,每个应该为id

id