如何在大型Casandra集合中有效地查询数据?

时间:2019-03-11 12:32:10

标签: node.js cassandra-2.0

我有一个很大的Casandra集合(一百万个文档),我想查询整个用户表数据中的一百万条记录。当我运行以下查询时,它仅返回约10K记录。

请让我知道从Casandra集合查询整个文档的有效方法是什么

我使用https://www.npmjs.com/package/cassandra-driver npm作为casandra驱动程序

I0311 16:57:21.281645 MainThread program.py:165] Not bringing up TensorBoard, but inspecting event files.
I0311 16:57:21.281645 140028330256128 program.py:165] Not bringing up TensorBoard, but inspecting event files.
======================================================================
Processing event files... (this can take a few minutes)
======================================================================

Found event files in:
./CN_flow1_95/eval
./CN_flow1_95/train

These tags are in ./CN_flow1_95/eval:
audio -
histograms -
images
   image-0
   image-1
   image-2
   image-3
   image-4
   image-5
   image-6
   image-7
   image-8
   image-9
scalars
   Losses/Loss/BoxClassifierLoss/classification_loss
   Losses/Loss/BoxClassifierLoss/localization_loss
   Losses/Loss/RPNLoss/localization_loss
   Losses/Loss/RPNLoss/objectness_loss
   PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'cyclist'
   PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'motorcyclist'
   PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'pedestrian'
   PascalBoxes_Precision/mAP@0.5IOU
tensor -
======================================================================

Event statistics for ./CN_flow1_95/eval:
audio -
graph
   first_step           0
   last_step            0
   max_step             0
   min_step             0
   num_steps            1
   outoforder_steps     []
histograms -
images
   first_step           0
   last_step            4112
   max_step             4112
   min_step             0
   num_steps            7
   outoforder_steps     []
scalars
   first_step           0
   last_step            4112
   max_step             4112
   min_step             0
   num_steps            7
   outoforder_steps     []
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
======================================================================

These tags are in ./CN_flow1_95/train:
audio -
histograms
   ModelVars/...
images -
scalars
   Losses/TotalLoss
   Losses/clone_0/Loss/BoxClassifierLoss/classification_loss
   Losses/clone_0/Loss/BoxClassifierLoss/localization_loss
   Losses/clone_0/Loss/RPNLoss/localization_loss
   Losses/clone_0/Loss/RPNLoss/objectness_loss
   Losses/clone_1/Loss/BoxClassifierLoss/classification_loss
   Losses/clone_1/Loss/BoxClassifierLoss/localization_loss
   Losses/clone_1/Loss/RPNLoss/localization_loss
   Losses/clone_1/Loss/RPNLoss/objectness_loss
   Losses/clone_2/Loss/BoxClassifierLoss/classification_loss
   Losses/clone_2/Loss/BoxClassifierLoss/localization_loss
   Losses/clone_2/Loss/RPNLoss/localization_loss
   Losses/clone_2/Loss/RPNLoss/objectness_loss
   batch/fraction_of_150_full
   clone_0/Losses/clone_0//clone_loss
   global_step/sec
   queue/prefetch_queue/fraction_of_5_full
tensor -
======================================================================

Event statistics for ./CN_flow1_95/train:
audio -
graph
   first_step           0
   last_step            0
   max_step             0
   min_step             0
   num_steps            1
   outoforder_steps     []
histograms
   first_step           0
   last_step            4110
   max_step             4110
   min_step             0
   num_steps            28
   outoforder_steps     []
images -
scalars
   first_step           0
   last_step            4110
   max_step             4110
   min_step             0
   num_steps            54
   outoforder_steps     []
sessionlog:checkpoint
   first_step           1
   last_step            4111
   max_step             4111
   min_step             1
   num_steps            7
   outoforder_steps     []
sessionlog:start
   outoforder_steps     []
   steps                [0, 4110]
sessionlog:stop
   outoforder_steps     []
   steps                [0, 0]
tensor -
======================================================================

1 个答案:

答案 0 :(得分:2)

为什么不能一次检索所有数据是因为可以一次读取的项数有一定限制,这是可以理解的。

看一下您使用streameachRow方法的documentation。这样您就可以多次处理集合中的条目。


client.stream(query, parameters, options)
  .on('readable', function () {
    // readable is emitted as soon a row is received and parsed
    let row;
    while (row = this.read()) {
      // process row
    }
  })
  .on('end', function () {
    // emitted when all rows have been retrieved and read
  });

client.eachRow(query, parameters, { prepare: true, autoPage : true }, function(n, row) {
   // Invoked per each row in all the pages
}, callback);