我有DataFrame
个Timestamp
列,我需要将其转换为Date
格式。
是否有可用的Spark SQL函数?
答案 0 :(得分:39)
您可以cast
列到目前为止:
Scala的:
import org.apache.spark.sql.types.DateType
val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType))
Pyspark:
df = df.withColumn('dateColumn', df['timestampColumn'].cast('date'))
答案 1 :(得分:12)
在SparkSQL中:
SELECT
CAST(the_ts AS DATE) AS the_date
FROM the_table
答案 2 :(得分:4)
想象一下以下输入:
val dataIn = spark.createDataFrame(Seq(
(1, "some data"),
(2, "more data")))
.toDF("id", "stuff")
.withColumn("ts", current_timestamp())
dataIn.printSchema
root
|-- id: integer (nullable = false)
|-- stuff: string (nullable = true)
|-- ts: timestamp (nullable = false)
您可以使用to_date功能:
val dataOut = dataIn.withColumn("date", to_date($"ts"))
dataOut.printSchema
root
|-- id: integer (nullable = false)
|-- stuff: string (nullable = true)
|-- ts: timestamp (nullable = false)
|-- date: date (nullable = false)
dataOut.show(false)
+---+---------+-----------------------+----------+
|id |stuff |ts |date |
+---+---------+-----------------------+----------+
|1 |some data|2017-11-21 16:37:15.828|2017-11-21|
|2 |more data|2017-11-21 16:37:15.828|2017-11-21|
+---+---------+-----------------------+----------+
我建议使用这些方法而不是强制转换和纯SQL。
答案 3 :(得分:1)
对于Spark 2.4 +,
import spark.implicits._
val newDF = df.withColumn("dateColumn", $"timestampColumn".cast(DateType))
OR
val newDF = df.withColumn("dateColumn", col("timestampColumn").cast(DateType))
答案 4 :(得分:0)
最好用的东西..经过尝试和测试-
df_join_result.withColumn('order_date', df_join_result['order_date'].cast('date'))