在Cassandra中使用轻量级事务(CAS)时,如何避免丢失写入?

时间:2014-12-05 10:05:56

标签: concurrency transactions cassandra compare-and-swap optimistic-concurrency

我正在对Cassandra进行一些测试,看看我们是否可以将它用于支持乐观并发的可伸缩键值存储。

由于键值存储只需要一个表,并且每个项都是按键访问的,因此lightweight transactions似乎很容易为我们的问题提供技术基础。

然而,当运行a test which does a number of concurrent updates(并且只要检测到并发性时重试),就会发现我们丢失了写入

测试创建一个表:

CREATE TABLE objects (key text, version int, PRIMARY KEY(key));

使用以下方法插入一些键:

INSERT INTO objects (key, version) VALUES (?, 0) IF NOT EXISTS;

然后使用CAS操作将这些项目的版本增加若干次:

-- client retrieves the current version
SELECT version FROM objects WHERE key = ?;

-- and updates the item using the retrieved version as version check
UPDATE objects SET version = ? WHERE key = ? IF version = ?;

对于更新,客户端代码实际上如下所示:

private async Task<bool> CompareAndSet(string key, int currrentCount, PreparedStatement updateStatement)
{
    // increment the version
    IStatement statement = updateStatement.Bind(currrentCount + 1, key, currrentCount);

    // execute the statement
    RowSet result = await Session.ExecuteAsync(statement);

    // check the result
    Row row = result.GetRows().SingleOrDefault();

    if (row == null)
        throw new Exception("No row in update result.");

    // check if the CAS operation was applied or not
    return row.GetValue<bool>("[applied]");
}

如您所见,由于并发性,无法应用CAS操作。因此,重试此操作直到成功。还处理写入超时异常。 The rationale behind handling the write timeout exceptions is explained here.

private async Task Update(string key, PreparedStatement selectStatement, PreparedStatement updateStatement)
{
    bool done = false;

    // try update (increase version) until it succeeds
    while (!done)
    {
        // get current version                
        TestItem item = null;

        while (item == null)
            item = await GetItem(key, selectStatement);

        try
        {
            // update version using lightweight transaction 
            done = await CompareAndSet(key, item.Version, updateStatement);

            // lightweight transaction (CAS) failed, because compare failed --> simply not updated
            if (!done)
                Interlocked.Increment(ref abortedUpdates);
        }
        catch (WriteTimeoutException wte)
        {
            // partial write timeout (some have been updated, so all must be eventually updated, because it is a CAS operation)
            if (wte.ReceivedAcknowledgements > 0)
            {
                Interlocked.Increment(ref partialWriteTimeouts);
                done = true;
            }
            else
                // complete write timeout --> unsure about this one...
                Interlocked.Increment(ref totalWriteTimeouts);
        }
    }
}

以下是测试的输出,该测试使用100个项目并对每个项目进行10次更新:

Running test with 100 items and 10 updates per item.

Number of updates: 1000
Number of aborted updates due to concurrency: 3485
Number of total write timeouts: 18
Number of partial write timeouts: 162

LOST WRITES: 94 (or 9,40%)

Results: 

Updates | Item count
     10 |         35
      9 |         43
      8 |         17
      7 |          3
      6 |          2

Xunit.Sdk.EqualExceptionAssert.Equal() Failure
Expected: 0
Actual:   94

如您所见,这是一个高度并发的测试(请参阅必须重试更新的中止操作的数量)。 但是,坏消息是我们正在丢失写入。客户认为应该执行1000次更新,但在这种情况下会丢失94次。

写入丢失的数量是写入超时数量级。所以,他们似乎有联系。问题是:

  • 我们是否需要以更好的方式处理超时异常?
  • 有没有办法避免在Cassandra上进行CAS操作时丢失写入?

1 个答案:

答案 0 :(得分:2)

WriteTimeoutException表示Cassandra无法及时执行操作。通过测试,您可以将Cassandra置于高负载状态,并且任何操作都可能因超时异常而失败。所以你需要做的是重做你的操作并通过反复尝试从问题中恢复。它类似于SQLTimeoutException。你也需要为此辩护。