Question

期望的行为

我正在尝试以某种方式配置cassandra cdc，以便将commitlogsegments定期刷新到cdc_raw目录（假设每10秒钟一次）。

根据http://abiasforaction.net/apache-cassandra-memtable-flush/和https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configCDCLogging.html的文档，我发现：

memtable_flush_period_in_ms –这是一个CQL表属性，指定一个毫秒数，之后应该有一个内存表酡。此属性是在创建表时指定的。

和

将内存表刷新到包含数据的CommitLogSegments到磁盘后将启用CDC的表移至已配置的cdc_raw目录。

将它们放在一起，我认为通过设置memtable_flush_period_in_ms: 10000 cassandra flush可以每10秒对磁盘进行CDC更改，这就是我想要实现的。

我的配置

根据上述内容和我的配置，我希望该内存表每10秒刷新一次到cdc_raw目录。我正在使用以下配置：

cassandra.yaml：

cdc_enabled: true
commitlog_segment_size_in_mb: 1 
commitlog_total_space_in_mb: 2
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

表配置：

memtable_flush_period_in_ms = 10000
cdc = true

问题

不会定期将内存表刷新到cdc_raw目录，而在达到一定大小阈值时将刷新到commitlogs目录。

详细地，会发生以下情况：

当一个commitlogsegment达到1MB时，它将刷新到commitlog目录。 commitlog目录中最多有2个commitlog（请参阅配置commitlog_total_space_in_mb：2）。达到此阈值时，commitlog目录中最旧的commitlog文件将移至cdc_raw目录。

问题

如何定期将Cassandra CDC更改刷新到磁盘上？

Answer 1

Apache Cassandra's CDC in current version is tricky.

Commit log is 'global', meaning changes to any table go to the same commit log.

Your commit log segment can (and will) contain logs from tables other than the ones with CDC enabled. These include system tables.
Commit log segment is deleted and moved to cdc_raw directory after every logs in the commit log segment are flushed.

So, even you configure your CDC-enabled table to flush every 10 sec, there are logs from other tables still in the same commit log segment, which prevent from moving commit log to CDC directory.

There is no way to change the behavior other than trying to speed up the process by reducing commitlog_segment_size_in_mb (but you need to be careful not to reduce it to the size smaller than your single write requset).

This behavior is improved and will be released in next major version v4.0. You can read your CDC as fast as commit log is synced to disk (so when you are using periodic commit log sync, then you can read your change every commit_log_sync_period_in_ms milliseconds.

See CASSANDRA-12148 for detail.

By the way, you set commitlog_total_space_in_mb to 2, which I definitely do not recommend. What you are seeing right now is that Cassandra flushes every table when your commit log size exceeded this value to make more space. If you cannot reclaim your commit log space, then Cassandra would start throwing error and rejects writes.

如何定期将Cassandra CDC更改刷新到磁盘？

1 个答案: