Hadoop在单节点集群上运行排序示例

时间:2011-04-05 15:12:02

标签: sorting ubuntu random hadoop

我正在尝试在Hadoop单节点集群上运行排序示例。首先,我开始了守护:

hadoop@ubuntu:/home/user/hadoop$ bin/start-all.sh

然后我运行随机编写器示例以生成顺序文件作为输入文件。

hadoop@ubuntu:/home/user/hadoop$ bin/hadoop jar hadoop-*-examples.jar randomwriter rand

hadoop @ ubuntu:/ home / user / hadoop $ bin / hadoop jar hadoop - * - examples.jar randomwriter rand

运行0个地图。

Job started: Thu Mar 31 18:21:51 EEST 2011 
11/03/31 18:21:52 INFO mapred.JobClient: Running job: job_201103311816_0001 
11/03/31 18:21:53 INFO mapred.JobClient:  map 0% reduce 0% 
11/03/31 18:22:01 INFO mapred.JobClient: Job complete: job_201103311816_0001 
11/03/31 18:22:01 INFO mapred.JobClient: Counters: 0 
Job ended: Thu Mar 31 18:22:01 EEST 2011 

这项工作花了9秒钟。

hadoop@ubuntu:/home/user/hadoop$ bin/hadoop jar hadoop-*-examples.jar sort rand rand-sort

在1个节点上运行,从hdfs://localhost:54310/user/hadoop/rand排序到

hdfs://localhost:54310/user/hadoop/rand-sort减1。

Job started: Thu Mar 31 18:25:19 EEST 2011 
11/03/31 18:25:20 INFO mapred.FileInputFormat: Total input paths to process : 0 
11/03/31 18:25:20 INFO mapred.JobClient: Running job: job_201103311816_0002 
11/03/31 18:25:21 INFO mapred.JobClient:  map 0% reduce 0% 
11/03/31 18:25:32 INFO mapred.JobClient:  map 0% reduce 100% 
11/03/31 18:25:34 INFO mapred.JobClient: Job complete: job_201103311816_0002 
11/03/31 18:25:34 INFO mapred.JobClient: Counters: 9 
11/03/31 18:25:34 INFO mapred.JobClient:   Job Counters 
11/03/31 18:25:34 INFO mapred.JobClient:     Launched reduce tasks=1 
11/03/31 18:25:34 INFO mapred.JobClient:   FileSystemCounters 
11/03/31 18:25:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=96 
11/03/31 18:25:34 INFO mapred.JobClient:   Map-Reduce Framework 
11/03/31 18:25:34 INFO mapred.JobClient:     Reduce input groups=0 
11/03/31 18:25:34 INFO mapred.JobClient:     Combine output records=0 
11/03/31 18:25:34 INFO mapred.JobClient:     Reduce shuffle bytes=0 
11/03/31 18:25:34 INFO mapred.JobClient:     Reduce output records=0 
11/03/31 18:25:34 INFO mapred.JobClient:     Spilled Records=0 
11/03/31 18:25:34 INFO mapred.JobClient:     Combine input records=0 
11/03/31 18:25:34 INFO mapred.JobClient:     Reduce input records=0 
Job ended: Thu Mar 31 18:25:34 EEST 2011 

这项工作耗时14秒。

hadoop@ubuntu:/home/user/hadoop$ bin/hadoop dfs -cat rand-sort/part-00000

SEQ# “org.apache.hadoop.io.BytesWritable” org.apache.hadoop.io.BytesWritablej“我和9#

我是Hadoop的新手。我所做的一切都是正确的,还是我做错了什么?我的问题是,如何才能看到randomwritewr生成的数据和排序示例的结果是否正确?从哪里可以看到它们?

2 个答案:

答案 0 :(得分:1)

问题是,当您尝试运行作业时,您的tasktracker未启动,它不会立即启动。您可以运行bin / hadoop job -list-active-trackers来查看tasktracker是否已启动,可能需要一些时间才能完成。没有tasktracker =没有要将编写器映射到的节点。

答案 1 :(得分:0)

11/03/31 18:25:20 INFO mapred.FileInputFormat: Total input paths to process : 0 

没有输入,您必须提供作业必须指望其输入文件的路径。 似乎RandomWriter也没有输入,你必须为每个作业提供输入,否则什么都不会启动。

RandomWriter @ Hadoop Wiki