我在干净的AWS实例中使用默认配置设置cassandra,并在一行中插入10000列,每列有1MB数据。我使用这个ruby(版本1.9.3)脚本:
10000.times do
key = rand(36**8).to_s(36)
value = rand(36**1024).to_s(36) * 1024
Cas_client.insert(TestColumnFamily,TestRow,{key=>value})
end
每次运行此脚本时,它都会崩溃:
/usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in `read_all'
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:104:in `read_frame'
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer'
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `read_message_begin'
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `receive_message'
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:251:in `recv_batch_mutate'
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:243:in `batch_mutate'
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy' from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate'
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/protocol.rb:7:in `_mutate'
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/cassandra.rb:463:in `insert'
from a.rb:6:in `block in <main>'
from a.rb:3:in `times'
from a.rb:3:in `<main>'
然后cassandra正常执行,然后我运行另一个ruby脚本来获取插入了多少列:
p cas_client.count_columns(TestColumnFamily,TestRow)
此脚本再次崩溃,同样的错误消息。并且cassandra进程保持100%的cpu使用率。
AWS m1.xlarge type instance (15GB mem,800GB harddisk, 4cores cpu)
cassandra-1.1.2
ruby-1.9.3-p194
jdk-7u6-linux-x64
ruby-gems:
cassandra (0.15.0)
thrift (0.8.0)
thrift_client (0.8.1)
有什么问题?
答案 0 :(得分:2)
每个1mb的10,000列是10 gig的数据。
Cassandra rpc使用thrift,它要求rpc调用的整个返回值必须适合内存,因此尝试读取所有列将需要你将10 gig thrift对象加载到内存中这是不切实际的,尤其是在ruby中