R中的Hadoop wordcount示例

时间:2017-06-04 15:43:25

标签: r hadoop hadoop-streaming

我安装了hadoop-3.0.0-alpha2,我正在尝试执行Mapreduce wordcount示例。 我创建了mapper.R和reducer.R脚本,但是当我尝试执行作业时

hadoop jar /home/rania/Downloads/hadoop-streaming-0.20.204.0.jar \
-file  /home/rania/Downloads/mapper.R  -mapper /home/rania/Downloads/mapper.R \
-file /home/rania/Downloads/reducer.R  -reducer /home/rania/Downloads/reducer.R \
-input /readme -output /RCount

我得到以下

2017-06-04 08:12:42,252 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-06-04 08:12:43,119 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
packageJobJar: [/home/rania/Downloads/mapper.R, /home/rania/Downloads/reducer.R] [] /tmp/streamjob5589642909909116910.jar tmpDir=null
2017-06-04 08:12:43,303 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2017-06-04 08:12:43,603 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2017-06-04 08:12:43,734 ERROR streaming.StreamJob: Error launching job , Output path already exists : Output directory hdfs://localhost:9000/RCount already exists
Streaming Job Failed!

可能有什么问题? 谢谢!

1 个答案:

答案 0 :(得分:0)

尝试将脚本运行到hdfs上尚不存在的输出目录。将使用您选择的任何名称创建新目录。如果要再次使用同一目录,则必须删除其中的文件并在使用相同的输出目录名/ RCount再次运行脚本之前将其删除