hadoop - Hadoop YARN性能调优 - Thinbug

Hadoop YARN性能调优

时间：2014-03-26 11:48:41

标签： hadoop yarn

我的MR工作在Hadoop上运行得很慢。有些在reduce任务中被阻止，增加任务完成超时并没有帮助。

输入数据小，分割数为2
内存可用（＆＃39; free -m＆＃39;显示每个节点的使用量低于4GB），每个节点有32GB
yarnchild正在消耗100％的CPU（用＆＃34测试;顶部＆＃34;）

这里是节点特征：

lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 8
Thread(s) per core: 1
Core(s) per socket: 4
CPU socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Stepping: 5
CPU MHz: 2260.925
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K

以后是奴隶跑步的预览

free -m

total used free shared buffers cached
Mem: 32235 1646 30588 0 19 721
-/+ buffers/cache: 905 31329
Swap: 3813 0 3813

顶

top - 12:23:57 up 1:25, 1 user, load average: 0.15, 0.11, 0.03
Tasks: 151 total, 1 running, 150 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.3%us, 0.1%sy, 0.0%ni, 86.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 33008756k total, 1686948k used, 31321808k free, 20020k buffers
Swap: 3905528k total, 0k used, 3905528k free, 739040k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2032 root 20 0 521m 220m 14m S 107 0.7 54:55.85 java
1564 root 20 0 1457m 136m 14m S 1 0.4 0:43.53 java
1 root 20 0 8356 816 684 S 0 0.0 0:02.63 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/0
5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1
7 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
9 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2
10 root 20 0 0 0 0 S 0 0.0 0:00.01 ksoftirqd/2
11 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/2
12 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3
13 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3
14 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/3
15 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/4

JPS

2032 YarnChild
1807 MRAppMaster
1564 NodeManager
2312 Jps
1469 DataNode

工作状态

Job: job_1395829029033_0003
Job File: master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1395829029033_0003/job.xml
Job Tracking URL : /suno-40.sophia.grid5000.fr:80...29029033_0003/
Uber job : false
Number of maps: 2
Number of reduces: 1
map() completion: 1.0
reduce() completion: 0.6666668
Job state: RUNNING
retired: false
reason for failure:
Counters: 42
File System Counters
FILE: Number of bytes read=249437801
FILE: Number of bytes written=742321915
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=7591656
HDFS: Number of bytes written=0
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=640596
Map-Reduce Framework
Map input records=87199
Map output records=6291068
Map output bytes=234057781
Map output materialized bytes=246718039
Input split bytes=216
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=246718039
Reduce input records=54768
Reduce output records=0
Spilled Records=12582136
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=71456
CPU time spent (ms)=904320
Physical memory (bytes) snapshot=719138816
Virtual memory (bytes) snapshot=1675440128
Total committed heap usage (bytes)=628359168
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=7591440
File Output Format Counters
Bytes Written=0

一段时间后的工作状态！

Job: job_1395829029033_0003master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1395829029033_0003/job.xml
Job Tracking URL : ...8088/proxy/application_1395829029033_0003/
Uber job : false
Number of maps: 2
Number of reduces: 1
map() completion: 1.0
reduce() completion: 0.6666668
Job state: RUNNING
retired: false
reason for failure:  
Counters: 42
    File System Counters
        FILE: Number of bytes read=249437801
        FILE: Number of bytes written=742321915
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=7591656
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=1
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=640596
    Map-Reduce Framework
        Map input records=87199
        Map output records=6291068
        Map output bytes=234057781
        Map output materialized bytes=246718039
        Input split bytes=216
        Combine input records=0
        Combine output records=0
        Reduce input groups=1
        Reduce shuffle bytes=246718039
        Reduce input records=54768
        Reduce output records=0
        Spilled Records=12582136
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=71456
        CPU time spent (ms)=904320
        Physical memory (bytes) snapshot=719138816
        Virtual memory (bytes) snapshot=1675440128
        Total committed heap usage (bytes)=628359168
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=7591440
    File Output Format Counters
        Bytes Written=0

后者在减少67％后没有进展。任何帮助表示赞赏我无法找到改善表现的方法。

0 个答案:

没有答案