在单节点Hadoop MapReduce上运行多个Map任务

时间:2015-04-18 17:10:57

标签: java python hadoop mapreduce

我目前正在Ubuntu Server版的一个节点上运行Hadoop 2.6.0。我尝试使用python流选项通过“hadoop-streaming-2.6.0.jar”运行mapreduce程序。

我有四个要映射和缩小的输入文件,但无论我更改了多少设置,我都只能启动一个地图进程。仅使用100%的1核心。

我尝试了以下内容,

“mapred-site.xml”设置如下

    <property>
            <name>mapred.job.tracker</name>
            <value>localhost:54311</value>
            <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
            </description>
    </property>
    <property>
            <name>mapreduce.job.maps</name>
            <value>8</value>
    </property>
    <property>
            <name>mapred.jobtracker.maxtasks.per.job</name>
            <value>200000</value>
    </property>
    <property>
            <name>mapred.tasktracker.map.tasks.maximum</name>
            <value>8</value>
    </property>

    <property>
            <name>mapred.tasktracker.reduce.tasks.maximum</name>
            <value>8</value>
    </property>
    <property>
            <name>mapred.running.map.limit</name>
            <value>-1</value>
    </property>
    <property>
            <name>mapreduce.job.reduces</name>
            <value>8</value>
    </property>
</configuration>

我还尝试修改流.jar文件中的输入参数,更具体地说,如下所示: -D mapred.map.tasks=8 -D mapred.reduce.tasks=8然而,这也没有运气。任何见解都会非常有用。

谢谢!

Hadoop输出

15/04/18 10:09:12 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/home/map.py, /home/reduce.py] [] /tmp/streamjob6267815363585502110.jar tmpDir=null
15/04/18 10:09:13 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/04/18 10:09:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/04/18 10:09:13 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/04/18 10:09:14 INFO mapred.FileInputFormat: Total input paths to process : 4
15/04/18 10:09:14 INFO mapreduce.JobSubmitter: number of splits:60
15/04/18 10:09:14 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/04/18 10:09:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/04/18 10:09:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1597598798_0001
15/04/18 10:09:14 INFO mapred.LocalDistributedCacheManager: Localized file:/home/map.py as file:/app/hadoop/tmp/mapred/local/1429384154725/map.py
15/04/18 10:09:14 INFO mapred.LocalDistributedCacheManager: Localized file:/home/reduce.py as file:/app/hadoop/tmp/mapred/local/1429384154726/reduce.py
15/04/18 10:09:14 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/04/18 10:09:14 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/04/18 10:09:14 INFO mapreduce.Job: Running job: job_local1597598798_0001
15/04/18 10:09:14 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/04/18 10:09:15 INFO mapred.LocalJobRunner: Waiting for map tasks
15/04/18 10:09:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1597598798_0001_m_000000_0
15/04/18 10:09:15 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/04/18 10:09:15 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/data1GB/data-0.txt:0+16777216
15/04/18 10:09:15 INFO mapred.MapTask: numReduceTasks: 8
15/04/18 10:09:15 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/04/18 10:09:15 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/04/18 10:09:15 INFO mapred.MapTask: soft limit at 83886080
15/04/18 10:09:15 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/04/18 10:09:15 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/04/18 10:09:15 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/04/18 10:09:15 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/local/hadoop/bin/./map.py]
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/04/18 10:09:15 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
15/04/18 10:09:15 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/04/18 10:09:15 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
15/04/18 10:09:15 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
15/04/18 10:09:15 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/04/18 10:09:15 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
15/04/18 10:09:15 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
15/04/18 10:09:15 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
15/04/18 10:09:15 INFO streaming.PipeMapRed: Records R/W=266/1
15/04/18 10:09:15 INFO mapreduce.Job: Job job_local1597598798_0001 running in uber mode : false

0 个答案:

没有答案