Question

我们需要有效地转换大型键/值对列表，如下所示：

val providedData = List(
        (new Key("1"), new Val("one")),
        (new Key("1"), new Val("un")),
        (new Key("1"), new Val("ein")),
        (new Key("2"), new Val("two")),
        (new Key("2"), new Val("deux")),
        (new Key("2"), new Val("zwei"))
)

到每个键的值列表中，如下所示：

val expectedData = List(
  (new Key("1"), List(
    new Val("one"), 
    new Val("un"), 
    new Val("ein"))),
  (new Key("2"), List(
    new Val("two"), 
    new Val("deux"), 
    new Val("zwei")))
)

键值对来自大键/值存储（Accumulo），因此键将被排序，但通常会跨越spark分区边界。每个密钥可以有数百万个密钥和数百个值。

我认为这项工作的正确工具是spark的combineByKey操作，但是只能找到泛型类型（如Int）的简洁示例，我无法将其推广到用户 - 定义的类型如上。

由于我怀疑其他人会有同样的问题，我希望有人可以提供scala语法的完全指定（详细）和简洁示例，以便将combineByKey与上面的用户定义类型一起使用，或者可能指出一个我错过的更好的工具。

Answer 1

我不是Spark专家，但基于{{3}}，我认为您可以执行以下操作：

while (!gameLoop->done)
{
    int start = SDL_GetTicks();
    gameLoop->update();
    int time = SDL_GetTicks() - start;
    if (time < 0) continue; // if time is negative, the time probably overflew, so continue asap

    int sleepTime = gameLoop->millisecondsForFrame - time;
    if (sleepTime > 0)
    {
        SDL_Delay(sleepTime);
    }
}

使用val rdd = sc.parallelize(providedData) rdd.combineByKey( // createCombiner: add first value to a list (x: Val) => List(x), // mergeValue: add new value to existing list (acc: List[Val], x) => x :: acc, // mergeCominber: combine the 2 lists (acc1: List[Val], acc2: List[Val]) => acc1 ::: acc2 )：

aggregateByKey

将键/值对列表转换为spark中每个键的值列表

1 个答案: