performance - Major discrepancy between Cassandra coordinator latency and client latency

Major discrepancy between Cassandra coordinator latency and client latency

时间：2017-07-24 18:20:59

标签： performance cassandra

When I measure our p99 read latency at the coordinator with cassandra.ClientRequest.ReadLatency.p99, I get a time of ~20ms. When I measure it from our client applications using the DataStax Java driver, I get a p99 of ~100ms. The raw round trip time (network overhead) between these machines is ~6ms. Is the remaining discrepancy typical? Or is there some problem to solve here? The only other likely culprit I can think of is garbage collection on the coordinator node.

1 个答案:

答案 0 :(得分：0)

网络+内核+驱动程序反序列化+ gcs的延迟最有可能导致协调遗漏，导致无法很好地跟踪它们。另外，如何衡量它们很重要，但驱动程序指标是您应用程序看到的最有可能的指标。 ClientRequest指标之外的大部分时间都是您必须使用环境解决的问题。虽然您可能希望确保在NativeTransport阶段（tpstats）中没有处于阻塞状态的东西，这将在请求之前停止并且＃34;开始时间＆＃34;被标记。

我建议尝试使用hdr histogram进行监控，因为如果您使用Metrics计时器，则使用采样库（默认情况下使用的驱动程序）对于准确跟踪长尾延迟非常糟糕。