Question

我正在编写代码来更新cassandra中的整个表格。我使用fetchSize和setPagingState来读取＆＃34; big＆＃34;每个块的表（并避免超时）我的问题是，它的数量高于应该的数量。我想，当它更新某些行时，它会修改状态并重新读取有没有提示要避免这个？在我的例子中，一个表有400K行，并且该特征找到8500万行

问候吉恩Luc

          val insertPrepareStmt = userSession.prepare(s"INSERT INTO $table (id, value) VALUES (?, ?)")
       val stmt = userSession.prepare(s"SELECT id,value FROM $table").bind()
       var nextPage:Option[PagingState]=None
       var i:Int=0
       var nbConverted:Int=0
       do {
         nextPage match {
           case Some(p) => stmt. setPagingState(p)
           case _ =>
         }
         val rs= userSession.execute(stmt.setFetchSize(batchSize))
         nextPage=Option(rs.getExecutionInfo.getPagingState)
         // loop on rs
         for (row <- rs.all() )
         {
           val id=row.getString("id")
           val value =row.getByteArray("value")
    // modify value in newAvro
    val newAvrp= f(avro)
            userSession.executeAsync(insertPrepareStmt.bind(id,ByteBuffer.wrap(newAvro)))
            nbConverted+=1
           }
           i+=1
           if (i % 10000==0) logger.error(s"...number lines $i    number converted lines $nbConverted")
         }
       } while (nextPage.isDefined)

Answer 1

多次测试后，首先插入语句不会干扰分页选择。 cassandra返回的Resultset有更多的行解决方案是将结果集限制为fetchsize

//for (row <- rs.all() )
for (row <- rs.take(batchSize) )

更新表时cassandra错误的结果集

1 个答案: