Spark / Scala:展开(List [String],String)元组的列表

时间:2015-08-12 09:30:37

标签: scala apache-spark tuples scala-collections rdd

基本上this question仅适用于Scala。

如果RDD包含格式

的元素,我该如何进行以下转换
(List[String], String) => (String, String)

e.g。

([A,B,C], X)
([C,D,E], Y)

to

(A, X)
(B, X)
(C, X)
(C, Y)
(D, Y)
(E, Y)

所以

5 个答案:

答案 0 :(得分:8)

scala> val l = List((List('a, 'b, 'c) -> 'x), List('c, 'd, 'e) -> 'y)
l: List[(List[Symbol], Symbol)] = List((List('a, 'b, 'c),'x),
                                       (List('c, 'd, 'e),'y))

scala> l.flatMap { case (innerList, c) => innerList.map(_ -> c) }
res0: List[(Symbol, Symbol)] = List(('a,'x), ('b,'x), ('c,'x), ('c,'y),
                                    ('d,'y), ('e,'y))

答案 1 :(得分:2)

使用Spark,您可以通过以下方式解决问题:

object App {
  def main(args: Array[String]) {
    val input = Seq((List("A", "B", "C"), "X"), (List("C", "D", "E"), "Y"))

    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]")
    val sc = new SparkContext(conf)

    val rdd = sc.parallelize(input)

    val result = rdd.flatMap {
      case (list, label) => {
        list.map( (_, label))
      }
    }

    result.foreach(println)
  }
}

这将输出:

(C,Y)
(D,Y)
(A,X)
(B,X)
(E,Y)
(C,X)

答案 2 :(得分:1)

我认为RDD flatMapValues最适合这种情况。

product_option_value_id

将X与List(A,B,C)中的每个值进行映射,得到RDD对的RDD [(X,A),(X,B),(X,C)...(Y的A),(Y,B),(Y,C)]

答案 3 :(得分:0)

<?php

use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;

class PageTest extends WebTestCase
{
    public function testPage()
    {
        // create a client to get the content of the page
        $client = static::createClient();
        $crawler = $client->request('GET', '/page');

        // retrieve table rows
        $rows = $crawler->filter('.table-curved tr');

        $statesColumnIndex = array(
            // 0 indexed
            'ok' => 2,
            'ko' => 3,
            'na' => 4,
        );

        $expectedValues = array(
            // 0 indexed, row index => [$values]
            1 => ['identifier' => 1, 'state' => 'ok'],
            2 => ['identifier' => 2, 'state' => 'ok'],
            3 => ['identifier' => 3, 'state' => 'ko'],
        );

        foreach ($expectedValues as $rowIndex => $values) {
            // retrieve columns for row
            $columns = $rows->eq($rowIndex)->filter('td');

            // check item identifier
            $identifierColumn = $columns->eq(0);
            $this->assertEquals(
                (string) $values['identifier'],   
                trim($identifierColumn->text())
            );

            // check state
            $stateColumn = $columns->eq($statesColumnIndex[$values['state']]);
            $this->assertEquals(1, $stateColumn->filter('.glyphicon-ok')->count());
        }
    }
}

会给你:

  val l = (List(1, 2, 3), "A")
  val result = l._1.map((_, l._2))
  println(result)

答案 4 :(得分:0)

使用漂亮的理解并使参数通用

    def convert[F, S](input: (List[F], S)): List[(F, S)] = {
    for {
      x <- input._1
    } yield {
      (x, input._2)
    }
  }

示例电话

convert(List(1, 2, 3), "A")

会给你

List((1,A), (2,A), (3,A))
相关问题