地图100%减少100%,任务失败

时间:2017-04-30 04:31:16

标签: python hadoop hadoop2 hadoop-streaming

我是Hadoop的新手,我正在运行Map Reduce流程来计算不同商店的收入。 映射器和减速器程序完美运行。我仔细检查了文件和目标。

当我运行MapReduce命令时:

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce1/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar \
  -mapper mapper.py \
  -reducer reducer.py \
  -input /home/anwarvic \
  -output /joboutput

它提供以下输出:

17/04/30 05:48:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/30 05:48:14 INFO Configuration.deprecation: mapred.job.tracker is` deprecated. Instead, use mapreduce.jobtracker.address
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob7598928362555913238.jar tmpDir=null
17/04/30 05:48:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
17/04/30 05:48:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/30 05:48:21 INFO mapred.FileInputFormat: Total input paths to process : 5
17/04/30 05:48:21 INFO net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
17/04/30 05:48:24 INFO mapreduce.JobSubmitter: number of splits:6
17/04/30 05:48:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1493523215757_0002
17/04/30 05:48:27 INFO impl.YarnClientImpl: Submitted application application_1493523215757_0002
17/04/30 05:48:28 INFO mapreduce.Job: The url to track the job: http://anwar-computer:8088/proxy/application_1493523215757_0002/
17/04/30 05:48:28 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
17/04/30 05:48:28 INFO streaming.StreamJob: Running job: job_1493523215757_0002
17/04/30 05:48:28 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/30 05:48:29 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/30 05:49:08 INFO streaming.StreamJob:  map 17%  reduce 0%
17/04/30 05:49:10 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/30 05:49:41 INFO streaming.StreamJob:  map 17%  reduce 0%
17/04/30 05:49:42 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/30 05:49:43 INFO streaming.StreamJob:  map 17%  reduce 0%
17/04/30 05:49:45 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/30 05:50:07 INFO streaming.StreamJob:  map 17%  reduce 0%
17/04/30 05:50:08 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/30 05:50:37 INFO streaming.StreamJob:  map 100%  reduce 100%
17/04/30 05:50:41 INFO streaming.StreamJob: Job running in-process     (local Hadoop)
17/04/30 05:50:41 ERROR streaming.StreamJob: Job not successful.     Error: Task failed task_1493523215757_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

17/04/30 05:50:41 INFO streaming.StreamJob: killJob...
17/04/30 05:50:41 INFO impl.YarnClientImpl: Killed application application_1493523215757_0002
Streaming Command Failed!

输出基本上表示作业不成功,尽管Map和Reduce进程是100%

完成的

根据此answerthis中的状态,我将shebang标题添加到 mapper.py reduce.py 文件中:< / p>

#!/usr/bin/env python

顺便说一句,这answer对我不起作用!

我已经遇到了这个问题大约20个小时..所以任何帮助都会非常感激

1 个答案:

答案 0 :(得分:0)

我建议采取以下步骤:

  1. 访问您的职位跟踪网址:http://anwar-computer:8088/proxy/application_1493523215757_0002/
  2. 转到失败的映射器(您的日志表示映射器失败了Job failed as tasks failed. failedMaps:1 failedReduces:0)。您可以看到异常跟踪。
  3. 有关更详细的日志,请按照失败的映射器页面中提供的logs链接进行操作。
  4. 分析日志,最有可能找到根本原因。

    可能的根本原因:

    1. 您的数据可能格式不正确或与Mapper中的预期不同。
    2. 另一个原因可能是节点上的数据大小和可用内存。数据可能会被压缩,在映射器步骤中,当它被解压缩时,它会溢出内存。
    3. 我怀疑第1点可能是因为该进程试图多次运行映射器并失败。

      17/04/30 05:48:29 INFO streaming.StreamJob:  map 0%  reduce 0%
      17/04/30 05:49:08 INFO streaming.StreamJob:  map 17%  reduce 0%
      17/04/30 05:49:10 INFO streaming.StreamJob:  map 0%  reduce 0%
      17/04/30 05:49:41 INFO streaming.StreamJob:  map 17%  reduce 0%
      17/04/30 05:49:42 INFO streaming.StreamJob:  map 0%  reduce 0%
      17/04/30 05:49:43 INFO streaming.StreamJob:  map 17%  reduce 0%
      17/04/30 05:49:45 INFO streaming.StreamJob:  map 0%  reduce 0%
      17/04/30 05:50:07 INFO streaming.StreamJob:  map 17%  reduce 0%
      17/04/30 05:50:08 INFO streaming.StreamJob:  map 0%  reduce 0%
      

      此外,您可以在Mapper中添加更多日志以获取更多详细信息。

      或者,您也可以启用logger(将--loglevel DEBUG参数添加到hadoop命令)。 e.g。

      hadoop \
        --loglevel DEBUG \
        jar /usr/local/hadoop/share/hadoop/mapreduce1/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar \
        -mapper mapper.py \
        -reducer reducer.py \
        -input /home/anwarvic \
        -output /joboutput
      

      参考:https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/CommandsManual.html

相关问题