Pyspark映射从字符串的RDD到双精度列表的RDD

时间:2017-06-03 20:22:44

标签: apache-spark pyspark

我相信在spark / python编程的上下文中,这是一个相当基本的操作。我有一个看起来像这样的文本文件:

dan@dan-laptop:~/workspace/scratch$ clang++ eigen_func_test.cpp -I /home/dan/Downloads/eigen_3.3.3/ --std=c++11 && ./a.out 
In file included from eigen_func_test.cpp:2:
In file included from /home/dan/Downloads/eigen_3.3.3/Eigen/Core:436:
/home/dan/Downloads/eigen_3.3.3/Eigen/src/Core/PlainObjectBase.h:899:7: error: static_assert failed
      "INVALID_MATRIX_TEMPLATE_PARAMETERS"
  ...EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (Options&RowMajor)==RowMajor)
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/dan/Downloads/eigen_3.3.3/Eigen/src/Core/util/StaticAssert.h:32:40: note: expanded from macro
      'EIGEN_STATIC_ASSERT'
    #define EIGEN_STATIC_ASSERT(X,MSG) static_assert(X,#MSG);
                                       ^             ~
/home/dan/Downloads/eigen_3.3.3/Eigen/src/Core/PlainObjectBase.h:535:7: note: in instantiation of member function
      'Eigen::PlainObjectBase<Eigen::Matrix<double, 1, 3, 0, 1, 3> >::_check_template_params' requested here
      _check_template_params();
      ^
/home/dan/Downloads/eigen_3.3.3/Eigen/src/Core/Matrix.h:379:9: note: in instantiation of function template
      specialization 'Eigen::PlainObjectBase<Eigen::Matrix<double, 1, 3, 0, 1, 3>
      >::PlainObjectBase<Eigen::Block<Eigen::Matrix<double, -1, 3, 0, -1, 3>, 1, 3, false> >' requested here
      : Base(other.derived())
        ^
eigen_func_test.cpp:32:9: note: in instantiation of function template specialization 'Eigen::Matrix<double, 1, 3, 0,
      1, 3>::Matrix<Eigen::Block<Eigen::Matrix<double, -1, 3, 0, -1, 3>, 1, 3, false> >' requested here
        return f;
               ^
eigen_func_test.cpp:41:10: note: in instantiation of function template specialization
      'func<Eigen::Block<Eigen::Matrix<double, -1, 3, 0, -1, 3>, 1, 3, false> >' requested here
        cout << func( M.row(2) ) << endl;
                ^
1 error generated.

然后我使用以下代码读取文本文件:

mydata.txt
12  34  2.3  15
23  11  1.5  9
33  18  4.5  99

并将此文件作为字符串的RDD读入。但是,我想分离值并将它们全部转换为浮点数。所以我将上面的行更改为:

data = sc.textFile("mydata.txt") 

成功按空格分割数据。然而,我正在努力想出地图函数,然后转换为浮点数。类似的东西:

data = sc.textFile("matrix1.txt").map(lambda line: line.split(' '))

但这没有用。任何帮助赞赏! 谢谢!

编辑 - 请假设我不知道数据的列数。所以.map(lambda line:float(line [0]),float(line [1]),float(line [2]),float(line [3]))的内容并不是特别有帮助。< / p>

1 个答案:

答案 0 :(得分:0)

没关系,明白了。

.map(lambda line: [float(x) for x in line])
相关问题