带有时区的pyspark时间戳

时间:2020-06-30 14:36:04

标签: pyspark apache-spark-sql

我正在尝试使用pyspark从表中提取一个值,我需要采用以下格式的值:2020-06-17T15:08:24z

df = spark.sql('select max(lastModDt)as lastModDate from db.tbl')

jobMetadata = existingMaxModifiedDate.withColumn("maxDate", date_format(to_timestamp(existingMaxModifiedDate.lastModDate, "yyyy-mm-dd HH:MM:SS.SSS"), "yyyy-mm-dd HH:MM:SS.SSS"))

但是,对于创建的列“ maxDate”,我一直为null。谢谢。

1 个答案:

答案 0 :(得分:1)

也许这很有用-

  val timeDF = spark.sql(
      """
        |select current_timestamp() as time1,
        | translate(date_format(current_timestamp(), 'yyyy-MM-dd HH:mm:ssZ') ,' ', 'T') as time2,
        | translate(date_format(current_timestamp(), 'yyyy-MM-dd#HH:mm:ss$') ,'#$', 'Tz') as time3
      """.stripMargin)
    timeDF.show(false)
    timeDF.printSchema()

    /**
      * +-----------------------+------------------------+--------------------+
      * |time1                  |time2                   |time3               |
      * +-----------------------+------------------------+--------------------+
      * |2020-06-30 21:22:04.541|2020-06-30T21:22:04+0530|2020-06-30T21:22:04z|
      * +-----------------------+------------------------+--------------------+
      *
      * root
      * |-- time1: timestamp (nullable = false)
      * |-- time2: string (nullable = false)
      * |-- time3: string (nullable = false) 
      */