Question

我需要将大型PDF（120 MB）存储到弹性搜索中。

我通过cygwin运行以下脚本：

$ curl -XPUT 'localhost:9200/samplepdfs/' -d '{
  "settings": {
    "index": {
      "number_of_ shards": 1,
      "number_of_replicas": 0
    }
  }
}'

{
  "acknowledged": true
}

$ coded=`cat sample.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`

$ json="{\"file\":\"${coded}\"}"

$ echo $json > json.file

$ curl -XPOST 'localhost:9200/samplepdfs/attachment/1' -d @json.file

并且服务器抛出out of memory Exception。

在 org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator .appendToCumulation（HttpChunkAggregator.java:208）

请建议解决方案/配置更改以解决问题。

Answer 1

错误很容易理解，你在小机器上做的很大。所以，通过配置我猜你有一台机器分配512 MB的RAM或2Gigs。

2 GB的RAM不足以用于您的文档。

那么，解决方案是什么？

购买更多内存并将8 Gigs或更多RAM与elasticsearch相关联
使用更多计算机（因此，您必须将索引拆分为至少5个分片）
如果您可以将文件分成小部分（我想您尝试索引的pdf文件是不可能的）

参考文献

http://elasticsearch-users.115913.n3.nabble.com/How-to-index-text-file-having-size-more-than-the-system-memory-td4028184.html

希望这能解决问题，谢谢

在Elasticsearch 1.3.2中将大型二进制文件存储为附件

1 个答案: