Elasticsearch CouchDB河流大块大小

时间:2013-08-20 20:18:43

标签: java couchdb elasticsearch

对不起,这是一篇与另一篇文章有​​关的内容。一旦我的CouchDB中获得了大量文档,ES就会开始在日志中抛出错误,并且不会为新文件编制索引:

[2013-08-19 17:55:08,379][WARN ][river.couchdb            ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....
java.io.IOException: Bogus chunk size
at sun.net.www.http.ChunkedInputStream.processRaw(ChunkedInputStream.java:319)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:572)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3052)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.elasticsearch.river.couchdb.CouchdbRiver$Slurper.run(CouchdbRiver.java:477)
at java.lang.Thread.run(Thread.java:724)
[2013-08-19 17:55:13,392][WARN ][river.couchdb            ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....

什么是交易?

编辑 - 河流状况

$ curl http://localhost:9200/_river/portal_production/_status?pretty=true 
{
  "_index" : "_river",
  "_type" : "portal_production",
  "_id" : "_status",
  "_version" : 2,
  "exists" : true, "_source" : {"ok":true,"node":{"id":"EVxlLNZ9SrSXYOLS0YBw7w","name":"Shadow Slasher","transport_address":"inet[/192.168.1.106:9300]"}}
}

编辑 - 河流序列数据

似乎很低!

curl -X GET http://localhost:9200/_river/portal_production/_seq?pretty=true
{
  "_index" : "_river",
  "_type" : "portal_production",
  "_id" : "_seq",
  "_version" : 1,
  "exists" : true, "_source" : {"couchdb":{"last_seq":"4"}}
}

顺便说一句,我的_changes变得更大了:

curl -X GET http://localhost:5984/portal_production/_changes?limit=5
    {"results":[
    {"seq":4,"id":"Ifilter-1","changes":[{"rev":"4-d9c8e771bc345d1182fbe7c2d63f5d00"}]},
    {"seq":7,"id":"Document-2","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    {"seq":10,"id":"Document-4","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    {"seq":13,"id":"Document-6","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    {"seq":16,"id":"Document-8","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
    ...
    {"seq":208657,"id":"Document-11295","changes":[{"rev":"8-37cb48660d28bef854b2c31132bc9635"}]},
    {"seq":208661,"id":"Document-11297","changes":[{"rev":"6-daf5c5d557d0fa30b2b08be26582a33c"}]},
    {"seq":208665,"id":"Document-11299","changes":[{"rev":"6-22e57345c2ee5c7aee8b7d664606b874"}]},
    {"seq":208669,"id":"Document-11301","changes":[{"rev":"6-06deee0c3c6705238a8b07e400b2414b"}]},
    {"seq":208673,"id":"Document-11303","changes":[{"rev":"6-86fc60dd8c1d415d42a25a23eb975121"}]},
    {"seq":208677,"id":"Document-11305","changes":[{"rev":"6-6d51a577fdc9013abf64ec4ffbf9eeee"}]},
    {"seq":208683,"id":"Document-11307","changes":[{"rev":"6-726a7835ce390094b9b9e0a91aeb11f0"}]},
    {"seq":208684,"id":"Document-11286","changes":[{"rev":"9-747e63e0304a974cc7db7ff84ae80697"}]}
    ],
    "last_seq":208684}

编辑 - Couchdb日志

这看起来很糟糕:

[Thu, 22 Aug 2013 02:49:37 GMT] [info] [<0.340.0>] 127.0.0.1 - - 'GET' /portal_production/_changes?feed=continuous&include_docs=true&heartbeat=10000&since=4 500

[Thu, 22 Aug 2013 02:49:42 GMT] [info] [<0.348.0>] 127.0.0.1 - - 'GET' /portal_production/_changes?feed=continuous&include_docs=true&heartbeat=10000&since=4 200

[Thu, 22 Aug 2013 02:49:42 GMT] [error] [<0.348.0>] Uncaught error in HTTP request: {exit,{ucs,{bad_utf8_character_code}}}

[Thu, 22 Aug 2013 02:49:42 GMT] [info] [<0.348.0>] Stacktrace: [{xmerl_ucs,from_utf8,1},
         {mochijson2,json_encode_string,2},
         {mochijson2,'-json_encode_proplist/2-fun-0-',3},
         {lists,foldl,3},
         {mochijson2,json_encode_proplist,2},
         {mochijson2,'-json_encode_proplist/2-fun-0-',3},
         {lists,foldl,3},
         {mochijson2,json_encode_proplist,2}]

2 个答案:

答案 0 :(得分:0)

所以我一个接一个地删除文档,然后用include_doc = true重试_changes查询。但我从来没有深究过它。阅读其他一些相关问题,从Microsoft文档中对文本进行的导入可能会有一些时髦的角色。我们正在做类似的事情,所以我们抛弃了数据库,并过滤掉了非UTF8字符。有点痛苦,但我们有太多文件可以找到问题所在。到目前为止,Elasticsearch方面没有错误(好吧,一些超时,但这可能是另一个线程)。

答案 1 :(得分:0)

您在索引办公室文件吗? 您可以使用附件插件。

我有一个branch尚未合并索引couchdb附件。如果你想测试一下,我很乐意收到反馈!

相关问题