Question

我在数据集中有Titanic数据集存储。我想从现有数据集创建新数据集。如果此人的年龄小于sex，则会将child titanic数据集列修改为16，如下所示。

def isChild(age:String):String={
  if(age.toDouble<16)
  {
    "Child"
  }else
  {
    age
  }
}

我正在尝试使用以下方法创建数据集：

titanic_df.na.drop.map(x=>isChild(x.getString(5))).show()

任何帮助，因为我想根据数据集的年龄列4修改数据集的4 ^th列，并处理NULL值。

enter image description here

Answer 1

从问题的理解中，做以下工作应该有所帮助

null

我希望答案很有帮助

<强>更新

查看发布的示例，您希望将age 2更改为sex并相应地更新titanic_df.withColumn("age",when(col("age").isNull, lit(2)).otherwise(col("age"))) .withColumn("sex", when(col("age") < 16, lit("Child")).otherwise(col("sex"))).show()列，以便您可以执行以下操作

{{1}}

使用scala中的apache spark更新数据集中的列

1 个答案: