我目前正在尝试使用Cloudera快速入门vm上的Cloudera Search批量索引开发,在文本文件中批量索引数据。我相信我的架构和morphline有问题,因为它完成了这项工作,并且在我进入Solr仪表板时,当它已编制索引但没有文档时,它似乎正在工作。核心显示但它只是零文件。我确信我正在运行的命令和cloudera搜索工作之前它允许我批量索引我使用示例输入文件,模式和morphline文件的示例,它应该工作,索引并将文档添加到核心。我用来执行此操作的命令是:
hadoop --config /etc/hadoop/conf.cloudera.yarn jar \
/usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool -D \
'mapred.child.java.opts=-Xmx500m' \
--log4j '/usr/share/doc/search-1.0.0+cdh5.4.0+0/examples/solr-nrt/log4j.properties' \
--morphline-file /usr/share/doc/search-1.0.0+cdh5.4.0+0/examples/solr-nrt/test-morphlines/readMultiLine.conf \
--output-dir hdfs://quickstart.cloudera:8020/user/outdir --verbose --go-live \
--zk-host 127.0.0.1:2181/solr --collection collection1 \
hdfs://quickstart.cloudera:8020/user/indir
我的架构是:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="sentences" version="1.5">
<fields>
<field name="id" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<field name="sentence" type="text_general" indexed="true" stored="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<dynamicField name="ignored_*" type="ignored"/>
</fields>
<uniqueKey>id</uniqueKey>
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
</schema>
对于我的morphline文件,我使用的是我在示例中找到的只读单行的文件:
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands : [
{
readLine {
ignoreFirstLine : true
commentPrefix : "#"
charset : UTF-8
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
]
我的示例输入是:(DocID标签句子)
1 For evening wear at the North Pole, girls could dress up in handsome Nordic sweaters and full iridescent taffeta skirts, or top one of the full striped skirts with a terrific short beige trench coat.
2 But working to change the communist-run system is illegal, and the party relentlessly punishes dissent.
3 Word of the latest document first came on Sept. 1, 1987, during a meeting between the pope and Jewish leaders in Castel Gandolfo, the pontiff's summer residence in the hills southeast of Rome.
4 Anita Moen-Guidon of Norway was third, 2:28.6 behind Lazutina, and Russia's Julia Chepalova fourth, 2:53.5 behind.
5 We have been beaten, we have shed blood, we have purchased the right to meet here today with our blood,'' said John Munuve, an assembly leader.
6 The folklore Nordic knits were handsome, in sweaters, or knee-length pants, and might have been topped by something like a super taffeta full coat.
7 Several politicians have charged that the high taxes Kenyans already pay go into the pockets of government officials or wasteful projects, and not into providing essential services and repairing crumbling infrastructure.
8 independence.