Question

我正在将Scala / Spark深度学习模型转换为Python / PySpark。读取df后，所有变量都被解释为字符串类型。我需要将它们作为浮点数投射。一个接一个地做这件事很容易，我想它会是这样的：

format_number(result['V1'].cast('float'),2).alias('V1')

，但有31列如何一次完成所有操作。列是“V1”到“V28”和“时间”，“金额”，“类”

Scala解决方案就是：

// cast all the column to Double type.
val df = raw.select(((1 to 28).map(i => "V" + i) ++ Array("Time", "Amount", "Class")).map(s => col(s).cast("Double")): _*)

https://docs.gitlab.com/ee/administration/custom_hooks.html

如何在PySpark中做同样的事情？

Answer 1

使用理解：

result.select([
    format_number(result[c].cast('float'),2).alias(c) for c in result.columns
])

Python / Spark转换多个变量 - 列为double类型

1 个答案: