Hive:无法将未分区的外部表中的OVERWRITE TABLE插入到新的分区表中

时间:2016-08-19 07:49:05

标签: hadoop hive hdfs hiveql

总结这就是我做的事情

原始数据 - >在HDFS中选择并保存过滤后的数据 - >使用HDFS中保存的文件创建外部表 - >使用外部表填充空表。

看看Exception,似乎这两个表之间有OUTPUT类型的东西

详情

1)我有" table_log"包含大量数据的表(在数据库A中)具有以下结构(具有3个分区):

CREATE TABLE `table_log`(
  `e_id` string, 
  `member_id` string, 
  .
  .
PARTITIONED BY ( 
  `dt` string, 
  `service_type` string, 
  `event_type` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  COLLECTION ITEMS TERMINATED BY '\u0002' 
  MAP KEYS TERMINATED BY '\u0003' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

2)我通过(td,service_type,event_type)过滤数据并将结果保存在HDFS中,如下所示:

INSERT OVERWRITE DIRECTORY  '/user/atscale/filterd-ratlog' SELECT * FROM rat_log WHERE dt >= '2016-05-01' AND dt <='2016-05-31' AND service_type='xxxx_jp' AND event_type='vv';

3)然后我用上面的结果创建了一个外部表(table_log_filtered_ext)(在数据库B中)。 请注意,此表没有分区。

DROP TABLE IF EXISTS table_log_filtered_ext;
CREATE EXTERNAL TABLE `table_log_filtered_ext`(
  `e_id` string, 
  `member_id` string, 
  .
  .
  dt string,
  service_type string,
  event_type string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  COLLECTION ITEMS TERMINATED BY '\u0002' 
  MAP KEYS TERMINATED BY '\u0003'
LOCATION '/user/atscale/filterd-ratlog'

4)我创建了另一个新表(table_log_filtered),类似于&#34; table_log&#34;结构(有3个分区):

CREATE TABLE `table_log_filtered` (
  `e_id` string, 
  `member_id` string, 
  .
  .
PARTITIONED BY ( 
  `dt` string, 
  `service_type` string, 
  `event_type` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  COLLECTION ITEMS TERMINATED BY '\u0002' 
  MAP KEYS TERMINATED BY '\u0003' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

5)现在我想填充&#34; table_log_filtered&#34;来自外部表&#34; table_log_filtered_ext&#34;

的数据表(在&#34; table_log&#34;中有3个分区)
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.execution.engine=tez; 

INSERT OVERWRITE TABLE rat_log_filtered PARTITION(dt, service_type, event_type) 
SELECT * FROM table_log_filtered_ext;

但我得到了这个&#34; java.lang.ClassCastException。 查看异常,这两个表之间的OUTPUT类型有一些东西。 AnyTips?:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":
.
.
.
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
      at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0
      at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
      at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
      ... 16 more
    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
      at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
      at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
      at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
      ... 17 more

1 个答案:

答案 0 :(得分:0)

万一其他人碰到这个问题,修复就像@Samson Scharfrichter提到的那样,我为table_log_filtered指定了STORED AS ORC

CREATE TABLE `table_log_filtered` (
  `e_id` string, 
  `member_id` string, 
  .
  .
PARTITIONED BY ( 
  `dt` string, 
  `service_type` string, 
  `event_type` string)
STORED AS ORC