确保RDD元素的数量相等

时间:2016-06-20 07:10:34

标签: scala apache-spark map-function

经过大量的RDD操作后,我现在有了一个干净的逗号分隔数据集。但每行RDD中的元素数量不相等。

ABCD,A,M@L,79,80,a
BGDA,F,D@L,89,9,b
SDAA,D,D@I,1,9,c
SWQA,E,D@I,2,0
TYA,E,D@I,2
RQA,E,D@I,2,12


val cleanedRDD = inputRDD
        .flatMap(line  => line._1.split("\n")) //split at newline
        .filter { x => !x.startsWith("#") && !x.startsWith("Worst") &&   !x.startsWith("Hold")} //filter out headers
        .map { x => x.drop(9) } //clean up chars
        .map (x => x.replaceAll(reg, ",")) //replace all consecutive spaces
        .filter(x=> !x.isEmpty())

如何映射上面的rdd以添加额外的逗号分隔符,其中值不会退出?

谢谢,室温

1 个答案:

答案 0 :(得分:3)

这不是一个火花问题,只是字符串操作

我认为确保获得n字段的最简单方法是在字符串中添加def splitInto(s:String, n:Int) = (s + ","*n).split(",", -1).take(n) splitInto("a,b,c,d", 4) //> Array[String] = Array(a, b, c, d) splitInto("a,b,c", 4) //> Array[String] = Array(a, b, c, "") splitInto("a,b", 4) //> Array[String] = Array(a, b, "", "") splitInto("a", 4) //> Array[String] = Array(a, "", "", "") splitInto("", 4) //> Array[String] = Array("", "", "", "") 个额外的逗号,将其拆分,然后返回第一个setOnItemLongClickListener。所以

listview.setOnItemLongClickListener(new OnItemLongClickListener() {
        @Override
        public boolean onItemLongClick(AdapterView<?> arg0, View arg1,
                int arg2, long arg3) {
            Toast.makeText(ClassName.class, "Long Clicked Trigger: ", Toast.LENGTH_LONG).show();
            return true;
        }
});
相关问题