Question

设置上下文，我们在cassandra中有4个表，其中有4个表，其中一个是剩余的数据表是搜索表（让我们看看DATA，SEARCH1，SEARCH2和SEARCH3表。）

我们有一个初始加载要求，在DATA表的一个请求中最多有15k行，因此搜索表保持同步。我们在批量插入中执行此操作，每个bacth作为4个查询（每个表一个）以保持一致性。

但是对于每一批我们都需要读取数据。如果存在，只更新DATA表的lastUpdatedDate列，否则插入所有4个表。

以下是我们如何做的代码片段：

public List<Items> loadData(List<Items> items) {
    CountDownLatch latch = new CountDownLatch(items.size());
    ForkJoinPool pool = new ForkJoinPool(6);
    pool.submit(() -> items.parallelStream().forEach(item -> {
      BatchStatement batch = prepareBatchForCreateOrUpdate(item);
      batch.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE);
      ResultSetFuture future = getSession().executeAsync(batch);
      Futures.addCallback(future, new AsyncCallBack(latch), pool);
    }));

    try {
      latch.await();
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
    }

    //TODO Consider what to do with the failed Items, Retry? or remove from the items in the return type
    return items;
}

private BatchStatement prepareBatchForCreateOrUpdate(Item item) {
    BatchStatement batch = new BatchStatement();
    Item existingItem = getExisting(item) //synchronous read
    if (null != data) {
      existingItem.setLastUpdatedDateTime(new Timestamp(System.currentTimeMillis()));
      batch.add(existingItem));
      return batch;
    }

    batch.add(item);
    batch.add(convertItemToSearch1(item));
    batch.add(convertItemToSearch2(item));
    batch.add(convertItemToSearch3(item));

    return batch;
  }

class AsyncCallBack implements FutureCallback<ResultSet> {
    private CountDownLatch latch;

    AsyncCallBack(CountDownLatch latch) {
      this.latch = latch;
    }

    // Cooldown the latch for either success or failure so that the thread that is waiting on latch.await() will know when all the asyncs are completed.
    @Override
    public void onSuccess(ResultSet result) {
      latch.countDown();
    }

    @Override
    public void onFailure(Throwable t) {
      LOGGER.warn("Failed async query execution, Cause:{}:{}", t.getCause(), t.getMessage());
      latch.countDown();
    }
  }

考虑到网络往返黑白应用程序和cassandra集群（两者都驻留在相同的DNS但kubernetes上有不同的pod），15k项目的执行大约需要1.5到2分钟。

我们有想法使得甚至读取调用getExisting（item）也是异步的，但是处理失败案例变得越来越复杂。是否有更好的cassandra数据加载方法（仅考虑通过datastax企业级Java驱动程序的Async wites）。

Answer 1

首先 - Cassandra中的批次是关系数据库中的其他内容。通过使用它们，您可以为群集增加更多负载。

关于使一切异步，我想到了以下可能性：

查询数据库，获取<ul id="sourceList"> <li>element1</li> <li>element2</li> </ul> <ul id="destinationList"></ul> <script> $('#sourceList li').click(function () { $(this).appendTo('#destinationList'); console.log("from source"); }); $('#destinationList li').on("click", function () { $(this).appendTo('#sourceList'); console.log("from destination"); }); </script>并添加监听器 - 将在查询完成时执行（覆盖Future）;
从该方法中，您可以根据从Cassandra获得的结果安排执行下一个操作。

您需要确保检查的一件事是，您不会同时发出太多同时请求。在协议的第3版中，每个连接最多可以有32k的飞行请求，但在您的情况下，您可以发出多达60k（4x15k）的请求。我使用following wrapper around Session class来限制正在进行的请求数量。

Cassandra Async读写，最佳实践

1 个答案: