如何在DataFrame中将时间戳转换为日期格式?

时间:2016-11-17 13:18:59

标签: apache-spark apache-spark-sql

我有DataFrameTimestamp列,我需要将其转换为Date格式。

是否有可用的Spark SQL函数?

5 个答案:

答案 0 :(得分:39)

您可以cast列到目前为止:

Scala的:

import org.apache.spark.sql.types.DateType

val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType))

Pyspark:

df = df.withColumn('dateColumn', df['timestampColumn'].cast('date'))

答案 1 :(得分:12)

在SparkSQL中:

SELECT
  CAST(the_ts AS DATE) AS the_date
FROM the_table

答案 2 :(得分:4)

想象一下以下输入:

val dataIn = spark.createDataFrame(Seq(
        (1, "some data"),
        (2, "more data")))
    .toDF("id", "stuff")
    .withColumn("ts", current_timestamp())

dataIn.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)

您可以使用to_date功能:

val dataOut = dataIn.withColumn("date", to_date($"ts"))

dataOut.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)
 |-- date: date (nullable = false)

dataOut.show(false)
+---+---------+-----------------------+----------+
|id |stuff    |ts                     |date      |
+---+---------+-----------------------+----------+
|1  |some data|2017-11-21 16:37:15.828|2017-11-21|
|2  |more data|2017-11-21 16:37:15.828|2017-11-21|
+---+---------+-----------------------+----------+

我建议使用这些方法而不是强制转换和纯SQL。

答案 3 :(得分:1)

对于Spark 2.4 +,

import spark.implicits._
val newDF = df.withColumn("dateColumn", $"timestampColumn".cast(DateType))    

OR

val newDF = df.withColumn("dateColumn", col("timestampColumn").cast(DateType))

答案 4 :(得分:0)

最好用的东西..经过尝试和测试-

df_join_result.withColumn('order_date', df_join_result['order_date'].cast('date'))