调试Hadoop作业因不同输入而失败的原因

时间:2017-10-13 03:14:29

标签: hadoop mapreduce

我正在尝试运行一个Hadoop作业,当我将输入指定为28重复我的玩具数据时,一切都运行得很好,但是,当我把它变成29整个事情时崩溃。

我的想法是代码的逻辑没有任何问题,因为它适用于28次重复,但不适用于29

以下是输入数据的2重复内容(重复不要与input files混淆,而是指前面那个长数字字符串的数量,即{{ 1}}):

0101

正在运行的作业的代码如下所示:

> evm --debug --code 7f00000000000000000000000000000000000000000000000000000000000000027f00000000000000000000000000000000000000000000000000000000000000027f00000000000000000000000000000000000000000000000000000000000000020101 run
opAdd      3903
opAdd      425
opStop       15
#### TRACE ####
PUSH32          pc=00000000 gas=9999999997 cost=3

PUSH32          pc=00000033 gas=9999999994 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000002

PUSH32          pc=00000066 gas=9999999991 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000002
00000001  0000000000000000000000000000000000000000000000000000000000000002

ADD             pc=00000099 gas=9999999988 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000002
00000001  0000000000000000000000000000000000000000000000000000000000000002
00000002  0000000000000000000000000000000000000000000000000000000000000002

ADD             pc=00000100 gas=9999999985 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000004
00000001  0000000000000000000000000000000000000000000000000000000000000002

STOP            pc=00000101 gas=9999999985 cost=0
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000006

#### LOGS ####
0x%  

成功作业的输出可以在下面找到:

import java.io.*;
import java.util.ArrayList;
import java.io.IOException;
import java.util.StringTokenizer;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.log4j.Logger;

public class ExecutionTimeTracker {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    private final Logger LOG = org.apache.log4j.Logger.getLogger(this.getClass());

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());

      FileSplit fileSplit = (FileSplit)context.getInputSplit();
      String filename = fileSplit.getPath().getName();

      //to write the file name as key
      Text text = new Text();
      text.set(filename);
      LOG.warn("fileName: " + filename);

      try {
          // command execution
          Runtime rt = Runtime.getRuntime();
          String evmDir = "/home/ubuntu/go/src/github.com/ethereum/go-ethereum/build/bin/evm";
          String command = evmDir + " --debug --code " + value.toString() + " run";
          Process proc = Runtime.getRuntime().exec(command);
          BufferedReader stdInput = new BufferedReader(new InputStreamReader(proc.getInputStream()));

          // output data struct
          ArrayList<String> consoleOutput = new ArrayList<String>();
          String s = null;
          while ((s = stdInput.readLine()) != null) {
              consoleOutput.add(s);
          }
          for (String p : consoleOutput) {
              Pattern pattern = Pattern.compile("([A-Za-z]+)([ \t]+)(\\d+)");
              Matcher matcher = pattern.matcher(p);
              while (matcher.find()) {
                  String groupThree = matcher.group(3);
                  IntWritable writeValue = new IntWritable(Integer.parseInt(groupThree));
                  context.write(text, writeValue);
              }
          }
          // close to prevent memory leak
          stdInput.close();
      } catch (IOException e) {
          LOG.warn("Exception Encountered!");
          LOG.warn(e);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "ExecutionTimeTracker");
    job.setJarByClass(ExecutionTimeTracker.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

执行此任务的从属节点的完整日志可以是found here

以下是不成功作业的输出:

17/10/13 02:17:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/13 02:17:11 INFO client.RMProxy: Connecting to ResourceManager at master/172.31.46.70:8032
17/10/13 02:17:11 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/10/13 02:17:11 INFO input.FileInputFormat: Total input files to process : 1
17/10/13 02:17:12 INFO mapreduce.JobSubmitter: number of splits:1
17/10/13 02:17:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1507833515636_0006
17/10/13 02:17:12 INFO impl.YarnClientImpl: Submitted application application_1507833515636_0006
17/10/13 02:17:12 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1507833515636_0006/
17/10/13 02:17:12 INFO mapreduce.Job: Running job: job_1507833515636_0006
17/10/13 02:17:22 INFO mapreduce.Job: Job job_1507833515636_0006 running in uber mode : true
17/10/13 02:17:22 INFO mapreduce.Job:  map 100% reduce 0%
17/10/13 02:17:25 INFO mapreduce.Job:  map 100% reduce 100%
17/10/13 02:17:26 INFO mapreduce.Job: Job job_1507833515636_0006 completed successfully
17/10/13 02:17:26 INFO mapreduce.Job: Counters: 52
    File System Counters
        FILE: Number of bytes read=64
        FILE: Number of bytes written=112
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=4042
        HDFS: Number of bytes written=295682
        HDFS: Number of read operations=35
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=8
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=501
        Total time spent by all reduces in occupied slots (ms)=2415
        TOTAL_LAUNCHED_UBERTASKS=2
        NUM_UBER_SUBMAPS=1
        NUM_UBER_SUBREDUCES=1
        Total time spent by all map tasks (ms)=501
        Total time spent by all reduce tasks (ms)=2415
        Total vcore-milliseconds taken by all map tasks=501
        Total vcore-milliseconds taken by all reduce tasks=2415
        Total megabyte-milliseconds taken by all map tasks=513024
        Total megabyte-milliseconds taken by all reduce tasks=2472960
    Map-Reduce Framework
        Map input records=1
        Map output records=56
        Map output bytes=448
        Map output materialized bytes=16
        Input split bytes=96
        Combine input records=56
        Combine output records=1
        Reduce input groups=1
        Reduce shuffle bytes=16
        Reduce input records=1
        Reduce output records=1
        Spilled Records=2
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=42
        CPU time spent (ms)=2060
        Physical memory (bytes) snapshot=971251712
        Virtual memory (bytes) snapshot=5902385152
        Total committed heap usage (bytes)=745537536
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1903
    File Output Format Counters 
        Bytes Written=10

执行任务的从属节点记录的完整输出错误日志可以是found here

由于这些工作正在uber mode中运行,这应该可以避免这个问题的许多潜在原因 - 但是,到目前为止,我还未能指出特定的问题 - 开放给所有的建议和见解! :)

也许它与每个容器的内存边界有关?

以下是我的配置文件:

17/10/12 20:42:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/10/12 20:42:42 INFO client.RMProxy: Connecting to ResourceManager at master/xxx.xxx.xxx.xxx:8032 17/10/12 20:42:42 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 17/10/12 20:42:42 INFO input.FileInputFormat: Total input files to process : 1 17/10/12 20:42:43 INFO mapreduce.JobSubmitter: number of splits:1 17/10/12 20:42:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1507833515636_0005 17/10/12 20:42:44 INFO impl.YarnClientImpl: Submitted application application_1507833515636_0005 17/10/12 20:42:44 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1507833515636_0005/ 17/10/12 20:42:44 INFO mapreduce.Job: Running job: job_1507833515636_0005 17/10/12 20:42:49 INFO mapreduce.Job: Job job_1507833515636_0005 running in uber mode : true 17/10/12 20:42:49 INFO mapreduce.Job: map 0% reduce 0% 17/10/12 20:43:01 INFO mapreduce.Job: map 67% reduce 0% 17/10/12 20:53:19 INFO mapreduce.Job: map 100% reduce 100% 17/10/12 20:53:19 INFO mapreduce.Job: Job job_1507833515636_0005 failed with state FAILED due to: Task failed task_1507833515636_0005_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 17/10/12 20:53:19 INFO mapreduce.Job: Counters: 18 Job Counters Failed map tasks=1 Killed reduce tasks=1 Launched map tasks=1 Launched reduce tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=629774 Total time spent by all reduces in occupied slots (ms)=1 TOTAL_LAUNCHED_UBERTASKS=1 NUM_UBER_SUBMAPS=1 Total time spent by all map tasks (ms)=629774 Total time spent by all reduce tasks (ms)=1 Total vcore-milliseconds taken by all map tasks=629774 Total vcore-milliseconds taken by all reduce tasks=1 Total megabyte-milliseconds taken by all map tasks=644888576 Total megabyte-milliseconds taken by all reduce tasks=1024 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0

mapred-site.xml

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> </configuration>

yarn-site.xml

<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>40960</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> <description>Whether virtual memory limits will be enforced for containers</description> </property> </configuration>

hdfs-site.xml

0 个答案:

没有答案