indexing - Cassandra二级索引get_indexed_slices超时

Cassandra二级索引get_indexed_slices超时

时间：2012-08-24 10:22:07

标签： indexing cassandra

我正在使用带有2个二级索引的Cassandra 0.8，用于“DeviceID”和“DayOfYear”等列。我有这两个索引，以便在一系列日期内检索设备的数据。每当我得到一个日期过滤器，我将其转换为DayOfYear并使用.net Thrift API使用索引切片进行搜索。目前我也无法升级数据库。

我的问题是我通常在使用当前日期的get_indexed_slices查询检索行时没有任何问题（使用当前的一年中的某一天）。但每当我查询昨天的某一天（这是索引列之一）时，我第一次进行查询时会有一段时间。大多数情况下，它在我第二次查询时返回，在第三次查询时返回100％。

这两列都在列族中创建为双数据类型，我通常每分钟获得1条记录。我在群集中有3个节点，nodetool报告表明节点已启动并运行，尽管来自nodetool的负载分布报告如下所示。

Starting NodeTool Address DC Rack Status State Load Owns xxx.xx.xxx.xx datacenter1 rack1 Up Normal 7.59 GB 51.39% xxx.xx.xxx.xx datacenter1 rack1 Up Normal 394.24 MB 3.81% xxx.xx.xxx.xx datacenter1 rack1 Up Normal 4.42 GB 44.80% 我在YAML中的配置如下。

hinted_handoff_enabled: true max_hint_window_in_ms: 3600000 # one hour hinted_handoff_throttle_delay_in_ms: 50 partitioner: org.apache.cassandra.dht.RandomPartitioner commitlog_sync: periodic commitlog_sync_period_in_ms: 120000 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 24 sliced_buffer_size_in_kb: 64 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: true snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 50000 index_interval: 128

我可能会遗漏一些东西吗？配置中有问题吗？

3 个答案:

答案 0 :(得分：2)

将数据复制到另一个列系列中，其中键是您的搜索数据。行切片变得更快

我个人从未在生产环境中使用二级索引。或者我遇到超时问题，或者二级索引检索数据的速度低于插入的数据量。我认为这与不按顺序读取数据和HD寻道时间有关。

答案 1 :(得分：1)

如果你来自关系模型，playOrm同样快，你可以在noSQL存储上建立关系但是你只需要对非常大的表进行分区。如果你这样做，你可以使用“可扩展的JQL”来做你的事情

@NoSqlQuery（name =“findJoinOnNullPartition”，query =“PARTITIONS t（：partId）select t FROM TABLE as t INNER JOIN t.security as s where s.securityType =：type and t.numShares =：shares”）

IT也有基本ORM的@ManyToOne，@ OneToMany等注释，虽然有些东西在noSQL中有所不同，但很多东西是相似的。

答案 2 :(得分：-1)

我终于以不同的方式解决了我的问题。事实上，我意识到问题出在我的数据模型上。

问题来了，因为我们来自RDBMS背景。我稍微重组了数据模型，现在，我得到了更快的响应。