Question

我用一个端点构建了一个简单的API。它会抓取文件，目前有大约30,000条记录。理想情况下，我希望能够通过一次http调用获取JSON中的所有记录。

这是我的Sinatra视图代码：

require 'sinatra'
require 'json'
require 'mongoid'

Mongoid.identity_map_enabled = false

get '/' do
  content_type :json
  Book.all
end

我尝试过以下方法：使用multi_json和

require './require.rb'
require 'sinatra'
require 'multi_json'
MultiJson.engine = :yajl

Mongoid.identity_map_enabled = false

get '/' do
  content_type :json
  MultiJson.encode(Book.all)
end

这种方法的问题是我得到错误R14（超出内存配额）。当我尝试使用＆＃39; oj＆＃39;宝石。

我只是将一个长Redis字符串整合在一起，但Heroku的redis服务每月30美元，用于我需要的实例大小（> 10mb）。

我目前的解决方案是使用后台任务创建对象，并在Mongoid对象大小限制（16mb）附近填充jsonified对象。这种方法的问题：渲染仍然需要将近30秒，我必须在接收应用程序上运行后处理才能从对象中正确提取json。

有没有人能更好地了解如何在一次通话中为30k记录渲染json而无需离开Heroku？

Answer 1

听起来你想直接将JSON流式传输到客户端而不是在内存中构建它。它可能是减少内存使用量的最佳方法。例如，您可以使用yajl将JSON直接编码为流。

编辑：我重写了yajl的整个代码，因为它的API更具吸引力并且允许更清晰的代码。我还提供了一个以块的形式读取响应的示例。这是我写的流式JSON数组助手：

require 'yajl'

module JsonArray
  class StreamWriter
    def initialize(out)
      super()
      @out = out
      @encoder = Yajl::Encoder.new
      @first = true
    end

    def <<(object)
      @out << ',' unless @first
      @out << @encoder.encode(object)
      @out << "\n"
      @first = false
    end
  end

  def self.write_stream(app, &block)
    app.stream do |out|
      out << '['
      block.call StreamWriter.new(out)
      out << ']'
    end
  end
end

用法：

require 'sinatra'
require 'mongoid'

Mongoid.identity_map_enabled = false

# use a server that supports streaming
set :server, :thin

get '/' do
  content_type :json
  JsonArray.write_stream(self) do |json|
    Book.all.each do |book|
      json << book.attributes
    end
  end
end

要在客户端进行解码，您可以以块为单位读取和解析响应，例如使用em-http。请注意，此解决方案要求客户端内存足够大以存储整个对象数组。这是相应的流解析器帮助程序：

require 'yajl'

module JsonArray
  class StreamParser
    def initialize(&callback)
      @parser = Yajl::Parser.new
      @parser.on_parse_complete = callback
    end

    def <<(str)
      @parser << str
    end
  end

  def self.parse_stream(&callback)
    StreamParser.new(&callback)
  end
end

用法：

require 'em-http'

parser = JsonArray.parse_stream do |object|
  # block is called when we are done parsing the
  # entire array; now we can handle the data
  p object
end

EventMachine.run do
  http = EventMachine::HttpRequest.new('http://localhost:4567').get
  http.stream do |chunk|
    parser << chunk
  end
  http.callback do
    EventMachine.stop
  end
end

替代解决方案

当你放弃生成＆＃34;正确的＆＃34;的需要时，你实际上可以简化整个事情。 JSON数组。上面的解决方案生成的是这种形式的JSON：

[{ ... book_1 ... }
,{ ... book_2 ... }
,{ ... book_3 ... }
...
,{ ... book_n ... }
]

然而，我们可以将每本书作为单独的JSON流式传输，从而将格式缩减为以下内容：

{ ... book_1 ... }
{ ... book_2 ... }
{ ... book_3 ... }
...
{ ... book_n ... }

服务器上的代码将更多更简单：

require 'sinatra'
require 'mongoid'
require 'yajl'

Mongoid.identity_map_enabled = false
set :server, :thin

get '/' do
  content_type :json
  encoder = Yajl::Encoder.new
  stream do |out|
    Book.all.each do |book|
      out << encoder.encode(book.attributes) << "\n"
    end
  end
end

和客户一样：

require 'em-http'
require 'yajl'

parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new do |book|
  # this will now be called separately for every book
  p book
end

EventMachine.run do
  http = EventMachine::HttpRequest.new('http://localhost:4567').get
  http.stream do |chunk|
    parser << chunk
  end
  http.callback do
    EventMachine.stop
  end
end

最棒的是，现在客户端不必等待整个响应，而是分别解析每本书。但是，如果您的某个客户端需要一个大的JSON数组，那么这将不起作用。

在Heroku上渲染大量JSON的有效方法

1 个答案: