Question

我在Create new column with function in Spark Dataframe

找到了解决问题的方法

但是我很难将以下代码转换为Java，因为它在Scala中已经存在了

import org.apache.spark.sql.functions._
val myDF = sqlContext.parquetFile("hdfs:/to/my/file.parquet")
val coder: (Int => String) = (arg: Int) => {if (arg < 100) "little" else "big"}
val sqlfunc = udf(coder)
myDF.withColumn("Code", sqlfunc(col("Amt")))

有人可以为我提供Java等效代码吗？我被困在2行以下转换

val coder: (Int => String) = (arg: Int) => {if (arg < 100) "little" else "big"}
val sqlfunc = udf(coder)

谢谢，

Answer 1

创建用户定义的功能：

public class CodeUdf implements  UDF1<Integer, String>{
    @Override
    public String call(Integer integer) throws Exception {
        if(integer < 100)
            return "little";
        else
            return"big";
    }
}

告诉Spark有关它

sqlContext.udf().register("Code", new CodeUdf(), DataTypes.IntegerType);

在选择中使用它。

df.selectExpr("value", "Code(value)").show();

Answer 2

curl -I http://servera.com/filea.tar.gz

在Spark with Java中将新列附加到现有CSV文件

2 个答案: