Hadoop 2.6.4 MR作业快速冻结

时间:2016-03-21 10:50:41

标签: hadoop amazon-ec2 mapreduce hdfs yarn

Hadoop 2.6.4:AWS EC2上有1个主服务器+2个从服务器

master:namenode,secondary namenode,resource manager

slave:datanode,节点管理器

运行测试MR作业(wordcount)时,它会立即冻结:

hduser@ip-172-31-4-108:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /data/shakespeare /data/out1
16/03/21 10:45:19 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-4-108/172.31.4.108:8032
16/03/21 10:45:21 INFO input.FileInputFormat: Total input paths to process : 5
16/03/21 10:45:21 INFO mapreduce.JobSubmitter: number of splits:5
16/03/21 10:45:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458556970596_0001
16/03/21 10:45:22 INFO impl.YarnClientImpl: Submitted application application_1458556970596_0001
16/03/21 10:45:22 INFO mapreduce.Job: The url to track the job: http://ip-172-31-4-108:8088/proxy/application_1458556970596_0001/
16/03/21 10:45:22 INFO mapreduce.Job: Running job: job_1458556970596_0001

在master上运行start-dfs.shstart-yarn.sh时,所有守护程序在相应的EC2实例上成功运行jps命令)。

启动MR作业时资源管理器下面的日志:

2016-03-21 10:45:20,152 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 1
2016-03-21 10:45:22,784 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 1 submitted by user hduser
2016-03-21 10:45:22,785 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1458556970596_0001
2016-03-21 10:45:22,787 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser   IP=172.31.4.108 OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1458556970596_0001
2016-03-21 10:45:22,788 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1458556970596_0001 State change from NEW to NEW_SAVING
2016-03-21 10:45:22,805 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1458556970596_0001
2016-03-21 10:45:22,807 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1458556970596_0001 State change from NEW_SAVING to SUBMITTED
2016-03-21 10:45:22,809 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1458556970596_0001 user: hduser leaf-queue of parent: root #applications: 1
2016-03-21 10:45:22,810 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Accepted application application_1458556970596_0001 from user: hduser, in queue: default
2016-03-21 10:45:22,825 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1458556970596_0001 State change from SUBMITTED to ACCEPTED
2016-03-21 10:45:22,866 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1458556970596_0001_000001
2016-03-21 10:45:22,867 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1458556970596_0001_000001 State change from NEW to SUBMITTED
2016-03-21 10:45:22,896 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue, it is likely set too low. skipping enforcement to allow at least one application to start
2016-03-21 10:45:22,896 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue for user, it is likely set too low. skipping enforcement to allow at least one application to start
2016-03-21 10:45:22,897 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application application_1458556970596_0001 from user: hduser activated in queue: default
2016-03-21 10:45:22,898 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application added - appId: application_1458556970596_0001 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@1d51055, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2016-03-21 10:45:22,898 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Added Application Attempt appattempt_1458556970596_0001_000001 to scheduler from user hduser in queue default
2016-03-21 10:45:22,900 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1458556970596_0001_000001 State change from SUBMITTED to SCHEDULED

启动MR作业时 NameNode 下面的日志:

2016-03-21 10:45:03,746 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-03-21 10:45:03,746 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:45:20,613 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 3 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 7 
2016-03-21 10:45:20,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.jar. BP-1804768821-172.31.4.108-1458553823105 blk_1073741834_1010{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]}
2016-03-21 10:45:21,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* checkFileProgress: blk_1073741834_1010{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} has not reached minimal replication 1
2016-03-21 10:45:21,292 INFO org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream: Nothing to flush
2016-03-21 10:45:21,297 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741834_1010{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 270356
2016-03-21 10:45:21,297 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741834_1010 size 270356
2016-03-21 10:45:21,706 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.jar is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:21,714 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.jar
2016-03-21 10:45:21,812 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.split
2016-03-21 10:45:21,823 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.split. BP-1804768821-172.31.4.108-1458553823105 blk_1073741835_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW], ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW]]}
2016-03-21 10:45:21,849 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741835_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW], ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW]]} size 0
2016-03-21 10:45:21,853 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741835_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW], ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW]]} size 0
2016-03-21 10:45:21,855 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.split is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:21,865 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.splitmetainfo. BP-1804768821-172.31.4.108-1458553823105 blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]}
2016-03-21 10:45:21,876 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:21,877 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:21,880 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:22,277 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.xml. BP-1804768821-172.31.4.108-1458553823105 blk_1073741837_1013{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]}
2016-03-21 10:45:22,327 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.14.198:50010 is added to blk_1073741837_1013{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:22,328 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.13.117:50010 is added to blk_1073741837_1013{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5c350bcc-f752-43cd-80c1-80f68e2db73e:NORMAL:172.31.13.117:50010|RBW], ReplicaUnderConstruction[[DISK]DS-a1e2988f-2ef7-4005-8129-0ca18c95b2cb:NORMAL:172.31.14.198:50010|RBW]]} size 0
2016-03-21 10:45:22,332 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1458556970596_0001/job.xml is closed by DFSClient_NONMAPREDUCE_-18612056_1
2016-03-21 10:45:33,746 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-03-21 10:45:33,747 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:46:03,748 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-03-21 10:46:03,748 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:46:33,748 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2016-03-21 10:46:33,749 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-03-21 10:47:03,749 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-03-21 10:47:03,750 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).

有什么想法吗?预先感谢您对我们的支持 !。

低于*-site.xml个文件内容。 注意:我确实已将一些尺寸标注结果值应用于属性,但我仍然遇到了EXACT SAME问题,只有最少的配置(只有强制属性)。

芯-site.xml中

<configuration>
    <property><name>fs.defaultFS</name><value>hdfs://ip-172-31-4-108:8020</value></property>
</configuration>

HDFS-site.xml中

<configuration>
    <property><name>dfs.replication</name><value>2</value></property>
    <property><name>dfs.namenode.name.dir</name><value>file:///xvda1/dfs/nn</value></property>
    <property><name>dfs.datanode.data.dir</name><value>file:///xvda1/dfs/dn</value></property>
</configuration>

mapred-site.xml中

<configuration>
    <property><name>mapreduce.jobhistory.address</name><value>ip-172-31-4-108:10020</value></property>
    <property><name>mapreduce.jobhistory.webapp.address</name><value>ip-172-31-4-108:19888</value></property>
    <property><name>mapreduce.framework.name</name><value>yarn</value></property>
    <property><name>mapreduce.map.memory.mb</name><value>512</value></property>
    <property><name>mapreduce.reduce.memory.mb</name><value>1024</value></property>
    <property><name>mapreduce.map.java.opts</name><value>410</value></property>
    <property><name>mapreduce.reduce.java.opts</name><value>820</value></property>
</configuration>

纱-site.xml中

<configuration>
    <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
    <property><name>yarn.resourcemanager.hostname</name><value>ip-172-31-4-108</value></property>
    <property><name>yarn.nodemanager.local-dirs</name><value>file:///xvda1/nodemgr/local</value></property>
    <property><name>yarn.nodemanager.log-dirs</name><value>/var/log/hadoop-yarn/containers</value></property>
    <property><name>yarn.nodemanager.remote-app-log-dir</name><value>/var/log/hadoop-yarn/apps</value></property>
    <property><name>yarn.log-aggregation-enable</name><value>true</value></property>
    <property><name>yarn.app.mapreduce.am.resource.mb</name><value>1024</value></property>
    <property><name>yarn.app.mapreduce.am.command-opts</name><value>820</value></property>
    <property><name>yarn.nodemanager.resource.memory-mb</name><value>6291456</value></property>
    <property><name>yarn.scheduler.minimum_allocation-mb</name><value>524288</value></property>
    <property><name>yarn.scheduler.maximum_allocation-mb</name><value>6291456</value></property>
</configuration>

0 个答案:

没有答案