由于SSTable很大,读取延迟会增加吗?

时间:2016-04-07 18:56:37

标签: cassandra

如果经过很长一段时间后,STCS产生了一个非常大的SSTable,后来我们收到了一个只存在于那个大SSTable中的分区键的读取请求(即它在该表的所有SSTable中是唯一的),因为我们正在处理一个大的SSTable,或者读取延迟是否不受分区索引大小的影响会增加读延迟吗?

另一方面,我认为在分区摘要的帮助下,然后使用带有指针的分区索引只有一个大的SSTable仍然比寻找更多更小的SSTable更好。

1 个答案:

答案 0 :(得分:2)

首先,Cassandra进程有一个分区密钥缓存实例,它由所有SSTable和所有表共享。其大小限制在cassandra.yaml

中定义
# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)). 
# Set to 0 to disable key cache.
key_cache_size_in_mb:

对于用于执行二进制搜索以找到最近的扫描分区偏移的索引摘要,通常我们会对每128个分区键进行采样,但对于具有大量分区键的SSTable,此采样可以增加以节省内存。

CREATE TABLE music.example (
    id int PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    ...
    AND max_index_interval = 2048
    AND min_index_interval = 128
    ...;

可以在cassandra.yaml

中配置索引摘要的总内存使用量
# A fixed memory pool size in MB for for SSTable index summaries. If left
# empty, this will default to 5% of the heap size. If the memory usage of
# all index summaries exceeds this limit, SSTables with low read rates will
# shrink their index summaries in order to meet this limit.  However, this
# is a best-effort process. In extreme conditions Cassandra may need to use
# more than this amount of memory.
index_summary_capacity_in_mb:

# How frequently index summaries should be resampled.  This is done
# periodically to redistribute memory from the fixed-size pool to sstables
# proportional their recent read rates.  Setting to -1 will disable this
# process, leaving existing index summaries at their current sampling level.
index_summary_resize_interval_in_minutes: 60

请参阅 CASSANDRA-6379 所以回答你的问题,大SSTable的读取性能:

  1. 如果偶然您在分区密钥缓存
  2. 中有缓存命中,可以很快
  3. 因为大SSTable的索引间隔会增加,所以会慢一些(例如,SSTable有很多不同的分区键,它不一定与绝对大小有关
  4. 如果不经常使用大SSTable会慢一些,请参阅 CASSANDRA-5519