Hive查询因堆问题而失败

时间:2015-03-17 16:16:31

标签: java cassandra hive

以下是正在运行的配置单元查询

INSERT INTO TABLE temp.table_output

SELECT /*+ STREAMTABLE(tableB) */ c.column1 as client, a.column2 as testData, 
    CASE WHEN ca.updated_date IS NULL OR ca.updated_date = 'null' THEN null ELSE CONCAT(ca.updated_date, '+0000') END as update
    FROM temp.tableA as a 
    INNER JOIN default.tableB as ca ON a.column5=ca.column2
    INNER JOIN default.tableC as c ON ca.column3=c.column1 WHERE a.name='test';

TableB有24亿行(140 GB),TableA和TableC有2亿条记录。

群集由3个Cassandra数据节点和3个Analytics节点(Cassandra上的Hive)组成,每个节点上有130GB内存。

TableA,TableB,TableC是hive内部表。

Hive群集堆大小为12GB。

有人告诉我,当我运行hive查询时,我遇到堆问题而且无法完成任务?它是在hive服务器上运行的唯一工作。

任务失败并出现以下错误,

Caused by: java.io.IOException: Read failed from file: cfs://172.31.x.x/tmp/hive-root/hive_2015-03-17_00-27-25_132_17376615815827139-1/-mr-10002/000049_0
    at com.datastax.bdp.hadoop.cfs.CassandraInputStream.read(CassandraInputStream.java:178)
    at java.io.DataInputStream.readFully(DataInputStream.java:195)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
    at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
    ... 16 more

Caused by: java.io.IOException: org.apache.thrift.TApplicationException: Internal error processing get_remote_cfs_sblock
    at com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.retrieveSubBlock(CassandraFileSystemThriftStore.java:537)
    at com.datastax.bdp.hadoop.cfs.CassandraSubBlockInputStream.subBlockSeekTo(CassandraSubBlockInputStream.java:145)
    at com.datastax.bdp.hadoop.cfs.CassandraSubBlockInputStream.read(CassandraSubBlockInputStream.java:95)
    at com.datastax.bdp.hadoop.cfs.CassandraInputStream.read(CassandraInputStream.java:159)
    ... 25 more

Caused by: org.apache.thrift.TApplicationException: Internal error processing get_remote_cfs_sblock
    at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
    at org.apache.cassandra.thrift.Dse$Client.recv_get_remote_cfs_sblock(Dse.java:271)
    at org.apache.cassandra.thrift.Dse$Client.get_remote_cfs_sblock(Dse.java:254)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.datastax.bdp.util.CassandraProxyClient.invokeDseClient(CassandraProxyClient.java:655)
    at com.datastax.bdp.util.CassandraProxyClient.invoke(CassandraProxyClient.java:631)
    at com.sun.proxy.$Proxy5.get_remote_cfs_sblock(Unknown Source)
    at com.datastax.bdp.hadoop.cfs.CassandraFileSystemThriftStore.retrieveSubBlock(CassandraFileSystemThriftStore.java:515)
    ... 28 more

Hive.log

2015-03-17 23:10:39,576 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_r_000023 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,579 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_r_000052 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,582 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_m_000207 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,585 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_r_000087 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,588 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_m_000223 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,591 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_m_000045 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,594 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_m_000235 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,597 ERROR exec.Task (SessionState.java:printError(419)) - Examining task ID: task_201503171816_0036_m_002140 (and more) from job job_201503171816_0036
2015-03-17 23:10:39,761 ERROR exec.Task (SessionState.java:printError(419)) - 
Task with the most failures(4): 
-----
Task ID:
  task_201503171816_0036_m_000036

URL:
  http://sjvtncasl064.mcafee.int:50030/taskdetails.jsp?jobid=job_201503171816_0036&tipid=task_201503171816_0036_m_000036
-----
Diagnostic Messages for this Task:
Error: Java heap space

2015-03-17 23:10:39,777 ERROR ql.Driver (SessionState.java:printError(419)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

1 个答案:

答案 0 :(得分:0)

大多数情况下,跟踪器端的hadoop错误不是很具描述性 - 比如“从其中一个节点检索数据时出现问题”。为了找出真正发生的事情,你需要从每个节点获取system.log,hive和hadoop任务日志,特别是那些没有及时返回数据的日志,并查看问题是什么错误的时间。在Ops Center中,您实际上可以单击正在进行的配置单元作业并查看每个节点上发生的情况,然后查看导致作业中断的错误。

以下是我发现的一些非常有用的链接。其中一些链接适用于较旧版本的DSE,但它们仍然为如何优化Hadoop操作和内存管理提供了良好的开端。

http://www.datastax.com/dev/blog/tuning-dse-hadoop-map-reduce

http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaHivTune.html

https://support.datastax.com/entries/23459322-Tuning-memory-for-Hadoop-tasks

https://support.datastax.com/entries/23472546-Specifying-the-number-of-concurrent-tasks-per-node

您可能还想阅读this article。有时,超时可能是由于主要的垃圾收集。

HTH