将大表（＆gt; 1.7M行）复制到另一个表的最佳方法是什么？

时间：2016-04-12 19:42:23

标签： datastax-enterprise datastax-startup

datastax企

datastax-启动

我们正在使用DataStax DSE群集。

我们正在尝试将表迁移到另一个表，其定义与第一个表相同，但具有二级索引

它有大约1.7M行

1）我们首先从cqlsh发出用户Cassandra COPY命令。需要很长时间＆gt; 1小时超时，没有用 2）然后我们编写一个程序将第一个表导出为CSV文件。我们将此CSV文件拆分为单独的CSV文件，并尝试将其加载到第二个表中。

插入需要一段时间，但它失败了

3）我们正在研究http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated

由于我们有CSV文件，这是正确的方法吗？

我们正在使用此库https://github.com/yukim/cassandra-bulkload-example来生成SSTABLE。

这是处理这个问题的正确方法吗？

1 个答案:

答案 0 :(得分：1)

如果您有csv，我建议您使用此批量加载程序：

https://github.com/brianmhess/cassandra-loader

如果您在群集上启用了Spark分析：

sc.cassandraTable（＆＃34; ks1＆＃34;，＆＃34; table＆＃34;）。saveToCassandra（＆＃34; ks2＆＃34;，＆＃34; table＆＃34;）

另见：

http://docs.datastax.com/en/latest-dse/datastax_enterprise/migration/migratingBulkSparkRDD.html