Map / Reduce任务广泛失败。任务ID:尝试_ * _ * _ 000001_0,状态:失败

时间:2017-11-02 08:18:23

标签: mapreduce yarn hadoop2

我是Hadoop的新手。我的笔记本电脑是32GB,Core i5,4核处理器。我已经通过虚拟机创建了多节点(3数据节点)apache hadoop集群2.7.4。我为每个数据节点分配了8GB,2个核心CPU,资源管理器虚拟机。当我在namenode上运行map reduce hadoop示例作业时,几乎每次我的作业由于Map任务失败或减少任务而失败。

  

我没有在日志中看到任何特定错误,但请注意所有地图和&   减少任务容器尝试在同一数据节点上查找容器,如果   它失败几次然后应用程序主机选择另一个节点   可用的容器。

有没有办法像循环方式一样在数据节点上分配容器?

任何帮助都会很明显。

输出 -

hduser@NameNode:/opt/hadoop/etc/hadoop$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar pi 2 4
Number of Maps  = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
.....
17/11/02 12:53:33 INFO mapreduce.Job: Running job: job_1509607315241_0001
17/11/02 12:53:40 INFO mapreduce.Job: Job job_1509607315241_0001 running in uber mode : false
17/11/02 12:53:40 INFO mapreduce.Job:  map 0% reduce 0%
17/11/02 12:53:55 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000001_0, Status : FAILED
17/11/02 12:53:55 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000000_0, Status : FAILED
17/11/02 12:54:01 INFO mapreduce.Job:  map 50% reduce 0%
17/11/02 12:54:09 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000001_1, Status : FAILED
17/11/02 12:54:14 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_r_000000_0, Status : FAILED
17/11/02 12:54:24 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000001_2, Status : FAILED
17/11/02 12:54:30 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_r_000000_1, Status : FAILED
17/11/02 12:54:40 INFO mapreduce.Job:  map 100% reduce 100%
17/11/02 12:54:44 INFO mapreduce.Job: Job job_1509607315241_0001 failed with state FAILED due to: Task failed task_1509607315241_0001_m_000001

纱site.xnl

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>192.168.10.109</value>
        <description> The hostname of the machine the resource manager runs on. </description>
  </property>
  <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>A list of auxiliary services run by the node manager. A service is implemented by the class defined by the property yarn.nodemanager.auxservices.servicename.class. By default, no auxiliary services are specified. </description>
  </property>
  <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        <description> </description>
  </property>
  <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>7096</value>
        <description>The amount of physical memory (in MB) that may be allocated to containers being run by the node manager         </description>
  </property>
  <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>6196</value>
        <description>RM can only allocate memory to containers in increments of "yarn.scheduler.minimum-allocation-mb" and not exceed "yarn.scheduler.maximum-allocation-mb" and It should not be more then total allocated memory of the Node. </description>
  </property>
  <property>
        <name>yarn.nodemanager.delete.debug-delay-sec</name>
        <value>6000</value>
  </property>
  <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
        <description>RM can only allocate memory to containers in increments of "yarn.scheduler.minimum-allocation-mb" and not exceed "yarn.scheduler.maximum-allocation-mb" and It should not be more then total allocated memory of the Node. </description>
  </property>

  <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>2048</value>
  </property>
  <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx2048m</value>
  </property>
  <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
  </property>
  <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>2</value>
        <description>The number of CPU cores that may be allocated to containers being run by the node manager.</description>
  </property>
  <property>
        <name>yarn.resourcemanager.bind-host</name>
        <value>192.168.10.109</value>
        <description> The address the resource manager’s RPC and HTTP servers will bind to.</description>
  </property>
  <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.10.109:8032</value>
        <description>The hostname and port that the resource manager’s RPCserver runs on. </description>
  </property>
  <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.10.109:8033</value>
        <description>The resource manager’s admin RPC server address and port. This is used by the admin client (invoked with yarn rmadmin, typically run outside the cluster) to communicate with the resource manager. </description>
  </property>
  <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.10.109:8030</value>
        <description>The resource manager scheduler’s RPC server address and port. This is used by (in-cluster) application masters to communicate with the resource manager.</description>
  </property>
  <property>
        <name>yarn.resourcemanager.resourcetracker.address</name>
        <value>192.168.10.109:8031</value>
        <description>The resource manager resource tracker’s RPC server address and port. This is used by (incluster) node managers to communicate with the resource manager. </description>
  </property>
  <property>
        <name>yarn.nodemanager.hostname</name>
        <value>0.0.0.0</value>
        <description>The hostname of the machine the node manager runs on. </description>
  </property>
  <property>
        <name>yarn.nodemanager.bind-host</name>
        <value>0.0.0.0</value>
        <description>The address the node manager’s RPC and HTTP servers will bind to. </description>
  </property>
  <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/opt/hadoop/hdfs/yarn</value>
        <description>A list of directories where nodemanagers allow containers to store intermediate data. The data is cleared out when the application ends.</description>
  </property>
  <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:8050</value>
        <description>The node manager’s RPC server address and port. This is used by (in-cluster) application masters to communicate with node managers.</description>
  </property>
  <property>
        <name>yarn.nodemanager.localizer.address</name>
        <value>0.0.0.0:8040</value>
        <description>The node manager localizer’s RPC server address and port. </description>
  </property>
  <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.10.109:8088</value>
        <description> The resource manager’s HTTP server address and port.</description>
  </property>
  <property>
        <name>yarn.nodemanager.webapp.address</name>
        <value>0.0.0.0:8042</value>
        <description>The node manager’s HTTP server address and port. </description>
  </property>
  <property>
        <name>yarn.web-proxy.address</name>
        <value>192.168.10.109:9046</value>
        <description>The web app proxy server’s HTTP server address and port. If not set (the default), then the web app proxy server will run in the resource manager process. MapReduce ApplicationMaster REST APIs are accessed using a proxy server, that is, Web Application Proxy server. Proxy server is an optional service in YARN. An administrator can configure the
service to run on a particular host or on the ResourceManager itself (stand-alone mode). If the proxy server is not configured, then it runs as a part of the ResourceManager service. By default, REST calls could be made to the web address port of
ResourceManager 8088. </description>
  </property>

</configuration>

mapred-site.xml中

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <description>Default framework to run.</description>
        </property>
      <!--  <property>
                <name>mapreduce.jobtracker.address</name>
                <value>localhost:54311</value>
                <description>MapReduce job tracker runs at this host and port.</description>
        </property> -->
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>192.168.10.109:19888</value>
                <description>The MapReduce job history server’s addressand port.</description>
        </property>
        <property>
                <name>mapreduce.shuffle.port</name>
                <value>13562</value>
                <description>The shuffle handler’s HTTP port number.This is used for serving map outputs, and is not a user-accessible web UI.</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>192.168.10.109:10020</value>
                <description>The job history server’s RPC server address and port. This is used by the client (typically outside the cluster) to query job history.</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.bind-host</name>
                <value>192.168.10.109</value>
                <description>Setting all of these values to 0.0.0.0 as in the example above will cause the MapReduce daemons to listen on all addresses and interfaces of the hosts in the cluster.</description>
        </property>
        <property>
                <name>mapreduce.job.userhistorylocation</name>
                <value>/opt/hadoop/hdfs/mrjobhistory</value>
                <description>User can specify a location to store the history files of a particular job. If nothing is specified, the logs are stored in output directory. The files are stored in "_logs/history/" in the directory. User can stop logging by giving the value "none".</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.intermediate-done-dir</name>
                <value>/opt/hadoop/hdfs/mrjobhistory/tmp</value>
                <description>Directory where history files are written by MapReduce jobs.</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.done-dir</name>
                <value>/opt/hadoop/hdfs/mrjobhistory/done</value>
                <description>Directory where history files are managed by the MR JobHistory Server.</description>
        </property>

        <property>
                <name>mapreduce.map.memory.mb</name>
                <value>2048</value>
        </property>

        <property>
                <name>mapreduce.reduce.memory.mb</name>
                <value>3072</value>
        </property>

        <property>
                <name>mapreduce.map.cpu.vcores</name>
                <value>1</value>
                <description> The number of virtual cores to request from the scheduler for each map task.</description>
        </property>

        <property>
                <name>mapreduce.reduce.cpu.vcores</name>
                <value>1</value>
                <description> The number of virtual cores to request from the scheduler for each reduce task.</description>
        </property>

        <property>
                <name>mapreduce.task.timeout</name>
                <value>1800000</value>
        </property>

        <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx1555m</value>
        </property>

        <property>
                <name>mapreduce.reduce.java.opts</name>
                <value>-Xmx2048m</value>
        </property>

        <property>
                <name>mapreduce.job.running.map.limit</name>
                <value>2</value>
                <description> The maximum number of simultaneous map tasks per job. There is no limit if this value is 0 or negative.</description>
        </property>

        <property>
                <name>mapreduce.job.running.reduce.limit</name>
                <value>1</value>
                <description> The maximum number of simultaneous reduce tasks per job. There is no limit if this value is 0 or negative.</description>
        </property>

        <property>
                <name>mapreduce.reduce.shuffle.connect.timeout</name>
                <value>1800000</value>
                <description>Expert: The maximum amount of time (in milli seconds) reduce task spends in trying to connect to a tasktracker for getting map output.</description>
        </property>

        <property>
                <name>mapreduce.reduce.shuffle.read.timeout</name>
                <value>1800000</value>
                <description>Expert: The maximum amount of time (in milli seconds) reduce task waits for map output data to be available for reading after obtaining connection.</description>
        </property>
<!--
        <property>
                <name>mapreduce.job.reducer.preempt.delay.sec</name>
                <value>300</value>
                <description> The threshold (in seconds) after which an unsatisfied mapper request triggers reducer preemption when there is no anticipated headroom. If set to 0 or a negative value, the reducer is preempted as soon as lack of headroom is detected. Default is 0.</description>
        </property>

        <property>
                <name>mapreduce.job.reducer.unconditional-preempt.delay.sec</name>
                <value>400</value>
                <description> The threshold (in seconds) after which an unsatisfied mapper request triggers a forced reducer preemption irrespective of the anticipated headroom. By default, it is set to 5 mins. Setting it to 0 leads to immediate reducer preemption. Setting to -1 disables this preemption altogether.</description>
        </property>
-->
</configuration>

2 个答案:

答案 0 :(得分:0)

问题在于数据节点中的/ etc / hosts文件。我们必须注意哪个主机名指向它的环回地址。我用日志中的行跟踪了这个错误 -

INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at DN1/127.0.1.1:54483

此前

127.0.1.1 DN1
192.168.10.104 dn1

# 127.0.1.1 DN1
192.168.10.104 DN1 

答案 1 :(得分:0)

我建议将mapred-site.xml添加到以下属性中:

<property>
    <name>mapreduce.map.maxattempts</name>
    <value>20</value>
</property>

<property>
    <name>mapreduce.reduce.maxattempts</name>
    <value>20</value>
</property>