AWS多区域VPC中的Cassandra集群

时间:2015-03-04 22:04:05

标签: amazon-web-services cassandra cassandra-2.0 amazon-vpc

我正在尝试为我的Cassandra集群实现以下架构:

  • 4个不同AWS区域中的1个VPC与IPSec实例链接在一起。
  • 1个Cassandra集群由4个节点组成,每个VPC中有1个
  • 在VPC(10.0.0.0/8)中与私有IP通信的节点
  • Cassandra数据可通过我自己的REST API从公共IP访问。

到目前为止,我已经能够实现群集的配置,安装OpsCenter并检查每个代理是否正常工作。 (作为参考,我使用了GossipPropertyFileSnitch并在机架配置中放置了" dc = us-west,rack = 1b"

我的问题是我的HTTP API速度很慢,而且Timeout方式太多了。我一直在尝试运行一些导入脚本(通过CQL驱动程序在Cassandra中插入HTTP)并继续遇到这种类型的错误:

  

执行批处理时出错:com.google.common.util.concurrent.UncheckedExecutionException:java.lang.Runtim   eException:org.apache.cassandra.exceptions.ReadTimeoutException:操作超时 - 仅收到0个响应。

作为参考,system.log中的相应错误是:

ERROR [SharedPool-Worker-1] 2015-03-04 19:25:39,598 ErrorMessage.java:243 - Unexpected exception during request
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache.get(LocalCache.java:3934) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na]
at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:56) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.Auth.getPermissions(Auth.java:78) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.authorize(ClientState.java:352) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:250) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:244) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:228) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:128) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.BatchStatement.checkAccess(BatchStatement.java:86) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.QueryProcessor.processBatch(QueryProcessor.java:500) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.transport.messages.BatchMessage.execute(BatchMessage.java:215) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.3.jar:2.1.3]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [na:1.8.0_31]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.3.jar:2.1.3]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_31]
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:279) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:100) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:67) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:82) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:79) ~[apache-cassandra-2.1.3.jar:2.1.3]
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2280) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2195) ~[guava-16.0.jar:na]
... 23 common frames omitted
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:103) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:139) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1338) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1265) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1188) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:253) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:206) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:268) ~[apache-cassandra-2.1.3.jar:2.1.3]
... 32 common frames omitted

它确实有效,我甚至可以连接到DevCenter并实际查看我的数据。但它失败了太多。

我的临时解决方案是在每个实例的公共IP上启用通信,并且仍然让它们一起处理私有IP。我现在正在进行导入。

现在我还在想:

  • 我的解决方案是否可行(Cassandra通过VPC + IPSec)
  • 如果没有,SSL节点到节点是否可行?
  • 此超时来自何处?

感谢您的帮助。

1 个答案:

答案 0 :(得分:4)

我个人认为这种解决方案不可行。有几个原因。

  1. 各地区之间的延迟时间很长。想象一下,您可能希望存储在群集中的所有数据都需要通过互联网进行复制,根据您选择的方法进行VPN或SSL加密/解密。我假设您选择了Cassandra,因为您计划拥有大量数据。
  2. 你会付出代价,因为八卦协议非常繁琐,你的所有数据都会来回多次通过端点。对于从一个节点发送到另一个节点的每个GB,您将为每GB支付0.02美元。
  3. 除非您在cassandra.yaml中增加所有相关的超时值,否则您将继续执行超时,但之后它将会非常缓慢。
  4. 您可以执行SSL节点到节点,这里是detail

    我不是100%确定超时原因,但是有一个严重的迹象表明它来自于节点没有在超时值内接收来自其他节点的响应这一事实:

      

    操作超时 - 仅收到0个回复。

    我建议您设置一个多数据中心群集,其中一个数据中心位于同一区域,另一个数据中心位于另一个区域。这样,您的应用程序就会与一组本地节点进行通信,然后将数据复制到远程数据中心节点。 Cassandra有办法减少multi-region datacenters之间的流量。

    Here是一个关于多区域数据中心的精彩幻灯片演示文稿。它还有一些我在这里没有介绍的有用信息。