如何在hive中将时间戳转换为gmt格式

时间:2017-02-13 21:03:02

标签: mysql hadoop apache-spark hive impala

我的表中有一个时间戳列,我从时间戳列中派生了一个名为dt_skey的列。为了清楚说明,我们假设时间戳列名为time_column。这就是time_column看起来像2017-02-05 03:33:50dt_skey列看起来像这个20170205033350的方式,它只是删除其间的符号。

我的问题是:time_column在est时区,我希望将其转换为gmt格式,同时我从中导出dt_skey。我想要这样做的原因是当我们通过impala查询时,时间戳将被转换为gmt格式,其中dt_skey将不会被转换,因为它是一个int数据类型。我通过配置单元进行摄取,当我们通过配置单元查询时,时间戳和dt_skey列将同步。出于报告目的和用户,我们使用impala,因此我想对dt_skey列进行更改,以便在用户查看impala时,两列都应该同步。

以下是用于从时间戳列中导出dt_skey列的sql:

cast(substr(regexp_replace(cast(time_column as string), '-',''),1,8) as int)as dt_skey

上述查询会将此2017-02-02 07:32:51转换为此20170202

请帮我将dt_skey格式化为GMT格式。我也欢迎火花解决方案。

4 个答案:

答案 0 :(得分:1)

在Spark中:

rdd = spark.sparkContext.parallelize([('2017-02-05 03:33:50',)])
df = spark.createDataFrame(rdd, ['EST'])
df = df.withColumn('GMT', f.to_utc_timestamp(df['EST'], 'EST'))
res = df.withColumn('YouWanna', f.date_format(df['GMT'], 'yyyyMMddHHmmss'))
res.show(truncate=False)

+-------------------+---------------------+--------------+
|EST                |GMT                  |YouWanna      |
+-------------------+---------------------+--------------+
|2017-02-05 03:33:50|2017-02-05 08:33:50.0|20170205083350|
+-------------------+---------------------+--------------+

或在蜂巢中:

select date_format(to_utc_timestamp('2017-02-05 03:33:50','EST'), 'yyyyMMddHHmmss') from dual

你是说这个吗?

答案 1 :(得分:0)

您只需在字段中添加0,如:

SELECT datetimefield+0;

SELECT CONVERT_TZ('2017-02-02 07:32:51','EST','GMT');

如果CONVERT_TZ返回NULL,则可以安装时区表,如:

mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root -p mysql

<强>样品

mysql> SELECT CONVERT_TZ('2017-02-02 07:32:51','EST','GMT');
+-----------------------------------------------+
| CONVERT_TZ('2017-02-02 07:32:51','EST','GMT') |
+-----------------------------------------------+
| 2017-02-02 12:32:51                           |
+-----------------------------------------------+
1 row in set (0,00 sec)

mysql>
mysql> SELECT DATE(TIMESTAMP('2017-02-02 07:32:51'))+0;
+------------------------------------------+
| DATE(TIMESTAMP('2017-02-02 07:32:51'))+0 |
+------------------------------------------+
|                                 20170202 |
+------------------------------------------+
1 row in set (0,00 sec)

mysql> select id, mydate, date(mydate), date(mydate)+0 from df;
+----+---------------------+--------------+----------------+
| id | mydate              | date(mydate) | date(mydate)+0 |
+----+---------------------+--------------+----------------+
|  1 | 2017-02-05 03:33:50 | 2017-02-05   |       20170205 |
+----+---------------------+--------------+----------------+
1 row in set (0,00 sec)

mysql>

mysql> SELECT TIMESTAMP('2017-02-05 03:33:50')+0;
+------------------------------------+
| TIMESTAMP('2017-02-05 03:33:50')+0 |
+------------------------------------+
|                     20170205033350 |
+------------------------------------+
1 row in set (0,00 sec)

mysql>
mysql> select id, mydate, mydate+0 from df;
+----+---------------------+----------------+
| id | mydate              | mydate+0       |
+----+---------------------+----------------+
|  1 | 2017-02-05 03:33:50 | 20170205033350 |
+----+---------------------+----------------+
1 row in set (0,00 sec)

mysql>

答案 2 :(得分:0)

假设您需要 Hive 查询,那就是我如何将Hive TimeStamp列(使用当前系统时区)转换为Impala TimeStamp(使用UTC与GMT相同除外) GMT已被弃用。)

CREATE TEMPORARY MACRO to_impala_timestamp(ts TIMESTAMP)
  CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(ts) +CAST(CAST(PRINTF('%tz', ts) AS FLOAT)*36.0 AS INT)) AS TIMESTAMP)
;
--## WARNING - do not use MACROs if your Hive version is below V1.3 (Apache, Horton)
--## or below V1.1-CDH5.7.3, V1.1-CDH5.8.3, V1.1-CDH5.9.0 (Cloudera)
--## cf. "HIVE-11432 Hive macro give same result for different arguments"

PRINTF('%tz', ts)将提取时区,负责夏令时动态 - 假设您正在处理的时间戳与相关联您的Hadoop群集使用的系统时区。如果它是一个不同的TZ,那么你必须相应地调整宏。

您可以使用此查询对其进行测试:

CREATE TABLE test_tz
STORED AS Parquet
AS
SELECT CAST(ts AS STRING) AS initial_ts_as_string
  , printf('%1$tz %1$tZ', ts) AS tzone_offset_and_code
  , ts AS ts_for_hive
  , to_impala_timestamp(ts) AS ts_for_impala
FROM ...

我们的群集使用中欧时间,以及结果如何显示在Hive ...

+--------------------------+--------------------+-----------------------------+-------------------------+
|  initial_ts_as_string    | tz_offset_and_code | ts_for_hive                 | ts_for_impala           |
+--------------------------+--------------------+-----------------------------+-------------------------+
| 2015-09-13 11:32:30.627  | +0200 CEST         | 2015-09-13 11:32:30.627     | 2015-09-13 13:32:30.0   |
| 2015-12-10 12:27:01.282  | +0100 CET          | 2015-12-10 12:27:01.282     | 2015-12-10 13:27:01.0   |
| 2016-05-17 15:49:06.386  | +0200 CEST         | 2016-05-17 15:49:06.386     | 2016-05-17 17:49:06.0   |

......然后在Impala ......

+-------------------------+--------------------+-------------------------------+---------------------+
|  initial_ts_as_string   | tz_offset_and_code | ts_for_hive                   | ts_for_impala       |
+-------------------------+--------------------+-------------------------------+---------------------+
| 2015-09-13 11:32:30.627 | +0200 CEST         | 2015-09-13 09:32:30.627000000 | 2015-09-13 11:32:30 |
| 2015-12-10 12:27:01.282 | +0100 CET          | 2015-12-10 11:27:01.282000000 | 2015-12-10 12:27:01 |
| 2016-05-17 15:49:06.386 | +0200 CEST         | 2016-05-17 13:49:06.386000000 | 2016-05-17 15:49:06 |

请注意,运行转换时会丢失毫秒数;它们可以通过额外的技巧进行恢复,但通常它超出了这一点。

<小时/> 旁注:要将TimeStamp(或Date或Float或其他)格式化为String,优秀的旧Java PRINTF()函数比使用默认格式加REGEXP_***()函数更实用...

答案 3 :(得分:0)

感谢您提供的所有解决方案

这里的所有答案都有部分解决方案,使用我尝试过以下语法的答案资源,它起作用了。

cast(substr(regexp_replace(to_utc_timestamp(timestamp_column, 'EST') ,'-',''),1,8) as int) as dt_skey

为了解释上面的语法,这就是我的timestamp列的样子(yyyy-MM-dd HH:mm:ss)&#34; 2017-02-16 12:20:21&#34;

运行上面的语法后,我的输出就像&#39; 20170216&#39;这是&#39; yyyyMMdd&#39; regexp_replace将执行正则表达式以仅显示yyyyMMdd。 to_utc_timestamp(timestamp_column, 'EST')会将timestamp列转换为UTC时区。