TaskTrackers默默地从JobTracker池中消失

时间:2013-03-06 04:12:52

标签: hadoop mapreduce

我们的Hadoop集群遇到了问题。 JobTracker从其UI中默默地取消订阅了10个节点(大约70个节点中)。

虽然这些节点曾经是在运行作业,但现在它们根本没有被JobTracker列出。不在NodesBlacklisted NodesGraylisted NodesExcluded Nodes下。

TaskTracker进程仍在主机上运行。我已检查过JobTracker和消失的节点之间是否存在网络连接和ssh的能力。

在日志中我看到最近我们开始遇到很多失败案例,例如:https://issues.apache.org/jira/browse/MAPREDUCE-5Jetty /mapOutput错误与TaskTrackers停止之间存在关联。

有谁知道什么会导致TaskTracker无声地失败而不是被放入列入黑名单的节点列表?

我已使用jstack转储了TaskTracker线程。

似乎TaskTracker试图关闭,但正在等待某些事情。

死锁检测:未发现死锁。

线程14005:

(state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.ipc.Client.stop() @bci=105, line=973 (Compiled frame)
 - org.apache.hadoop.ipc.RPC$ClientCache.stopClient(org.apache.hadoop.ipc.Client) @bci=47, line=191 (Interpreted frame)
 - org.apache.hadoop.ipc.RPC$ClientCache.access$500(org.apache.hadoop.ipc.RPC$ClientCache, org.apache.hadoop.ipc.Client) @bci=2, line=140 (Interpreted frame)
 - org.apache.hadoop.ipc.RPC$Invoker.close() @bci=19, line=238 (Interpreted frame)
 - org.apache.hadoop.ipc.RPC$Invoker.access$600(org.apache.hadoop.ipc.RPC$Invoker) @bci=1, line=203 (Interpreted frame)
 - org.apache.hadoop.ipc.RPC.stopProxy(org.apache.hadoop.ipc.VersionedProtocol) @bci=11, line=439 (Interpreted frame)
 - org.apache.hadoop.hdfs.DFSClient.close() @bci=34, line=283 (Interpreted frame)
 - org.apache.hadoop.hdfs.DistributedFileSystem.close() @bci=8, line=328 (Interpreted frame)
 - org.apache.hadoop.fs.FileSystem$Cache.closeAll() @bci=78, line=1446 (Interpreted frame)
 - org.apache.hadoop.fs.FileSystem.closeAll() @bci=40, line=277 (Interpreted frame)
 - org.apache.hadoop.fs.FileSystem$ClientFinalizer.run() @bci=0, line=260 (Interpreted frame)

线程18731:

(state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame)
 - org.apache.hadoop.mapred.TaskTracker$1.run() @bci=7, line=434 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

线程18730:

(state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame)
 - org.apache.hadoop.mapreduce.server.tasktracker.userlogs.UserLogManager.monitor() @bci=4, line=131 (Interpreted frame)
 - org.apache.hadoop.mapreduce.server.tasktracker.userlogs.UserLogManager$1.run() @bci=4, line=66 (Compiled frame)

线程18729:

(state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.mapred.UserLogCleaner.run() @bci=4, line=93 (Interpreted frame)

线程18728:

(state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.TimerThread.mainLoop() @bci=201, line=509 (Compiled frame)
 - java.util.TimerThread.run() @bci=1, line=462 (Interpreted frame)

线程18724:

(state = IN_NATIVE)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=210 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=65 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=69 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=80 (Compiled frame)
 - org.mortbay.io.nio.SelectorManager$SelectSet.doSelect() @bci=615, line=457 (Compiled frame)
 - org.mortbay.io.nio.SelectorManager.doSelect(int) @bci=24, line=190 (Compiled frame)
 - org.mortbay.jetty.nio.SelectChannelConnector.accept(int) @bci=5, line=124 (Compiled frame)
 - org.mortbay.jetty.AbstractConnector$Acceptor.run() @bci=151, line=706 (Compiled frame)
 - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=520 (Interpreted frame)

线程18671:

(state = IN_NATIVE)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=210 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=65 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=69 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=80 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select() @bci=2, line=84 (Compiled frame)
 - org.apache.hadoop.ipc.Server$Listener$Reader.run() @bci=33, line=333 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

线程18667:

(state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=156 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame)
 - org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run() @bci=47, line=130 (Compiled frame)

线程18661:

(state = BLOCKED)

线程18660:

(state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118 (Compiled frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=134 (Compiled frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 (Compiled frame)

线程18659:

(state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=485 (Compiled frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 (Compiled frame)

线程18644:

(state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Thread.join(long) @bci=38, line=1186 (Compiled frame)
 - java.lang.Thread.join() @bci=2, line=1239 (Interpreted frame)
 - java.lang.ApplicationShutdownHooks.runHooks() @bci=87, line=79 (Interpreted frame)
 - java.lang.ApplicationShutdownHooks$1.run() @bci=0, line=24 (Interpreted frame)
 - java.lang.Shutdown.runHooks() @bci=23, line=79 (Interpreted frame)
 - java.lang.Shutdown.sequence() @bci=26, line=123 (Interpreted frame)
 - java.lang.Shutdown.exit(int) @bci=96, line=168 (Interpreted frame)
 - java.lang.Runtime.exit(int) @bci=14, line=90 (Interpreted frame)
 - java.lang.System.exit(int) @bci=4, line=904 (Interpreted frame)
 - org.apache.hadoop.mapred.TaskTracker.main(java.lang.String[]) @bci=114, line=3722 (Interpreted frame)

0 个答案:

没有答案
相关问题