在reduce

时间:2017-01-11 00:37:18

标签: hadoop mapreduce hive cross-join

我在2个表(大小:table_a~100k行,table_b~2亿行)上运行交叉连接,如下面的hive:

select a.id, a.first_name, a.last_name, a.street
     , b.pid, b.first_nam, b.last_nam, b.strt
from table_a a
  cross join table_b b
where
  (a.phone_number in (b.phone_1, b.phone_2)
   and a.birth_date in (b.birth_date_1, b.birth_date_2))
  or
  (
  cast(a.zip_code as int) = cast(b.zip_code as int)
  and
    (
      (
      upper(a.last_name) = upper(b.last_nam)
      and a.birth_date in (b.birth_date_1, b.birth_date_2)
      )
      or
      (
      a.phone_number in (b.phone_1, b.phone_2)
      )
      or
      (
      a.birth_date in (b.birth_date_1, b.birth_date_2))
      and upper(a.street) = upper(b.strt))
      )
    )
  );

作业在群集上启动,并显示选择了160个映射器和1个减速器来运行它。地图部分很快完成(大约3分钟)。减少开始,它缓慢但稳定地从1%,2%,4%,......直到它达到67%。然后它停滞了30多分钟没有任何进展。虽然它表明累计的CPU时间增加了一些(从大约3500秒到4200秒)。但减速机刚刚超过67%。最终我不得不取消这份工作并试图重新运行。我每次都有同样的行为。我尝试通过设置以下选项来增加减少器的数量:

set mapred.job.reduces=10;

但是每当我运行查询时,它仍然只显示使用1个reducer。知道这里发生了什么以及如何解决问题吗?

修改:这是输出

INFO  : Stage-1 is selected by condition resolver.
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:160
INFO  : Submitting tokens for job: job_1484094252811_0005
INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN token 53 for hive)
INFO  : The url to track the job: redacted
INFO  : Starting Job = job_1484094252811_0005, redacted
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop/bin/hadoop job  -kill job_1484094252811_0005
INFO  : Hadoop job information for Stage-1: number of mappers: 160; number of reducers: 1
INFO  : 2017-01-11 14:11:36,957 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-01-11 14:11:49,499 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 10.45 sec
INFO  : 2017-01-11 14:11:50,549 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 17.03 sec
INFO  : 2017-01-11 14:11:51,610 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 1100.63 sec
INFO  : 2017-01-11 14:11:52,674 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 1181.04 sec
INFO  : 2017-01-11 14:11:54,773 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 1744.29 sec
INFO  : 2017-01-11 14:11:55,819 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 1769.61 sec
INFO  : 2017-01-11 14:11:57,903 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 2444.53 sec
INFO  : 2017-01-11 14:11:59,988 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 2505.06 sec
INFO  : 2017-01-11 14:12:01,032 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2999.97 sec
INFO  : 2017-01-11 14:12:12,486 Stage-1 map = 100%,  reduce = 2%, Cumulative CPU 3051.65 sec
INFO  : 2017-01-11 14:12:21,878 Stage-1 map = 100%,  reduce = 3%, Cumulative CPU 3084.96 sec
INFO  : 2017-01-11 14:12:27,088 Stage-1 map = 100%,  reduce = 4%, Cumulative CPU 3108.88 sec
INFO  : 2017-01-11 14:12:36,477 Stage-1 map = 100%,  reduce = 5%, Cumulative CPU 3132.38 sec
INFO  : 2017-01-11 14:12:45,837 Stage-1 map = 100%,  reduce = 6%, Cumulative CPU 3153.54 sec
INFO  : 2017-01-11 14:12:55,204 Stage-1 map = 100%,  reduce = 7%, Cumulative CPU 3181.5 sec
INFO  : 2017-01-11 14:13:01,437 Stage-1 map = 100%,  reduce = 8%, Cumulative CPU 3204.99 sec
INFO  : 2017-01-11 14:13:10,781 Stage-1 map = 100%,  reduce = 9%, Cumulative CPU 3231.7 sec
INFO  : 2017-01-11 14:13:17,012 Stage-1 map = 100%,  reduce = 10%, Cumulative CPU 3255.17 sec
INFO  : 2017-01-11 14:13:25,320 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 3280.29 sec
INFO  : 2017-01-11 14:13:31,560 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 3300.07 sec
INFO  : 2017-01-11 14:13:40,903 Stage-1 map = 100%,  reduce = 14%, Cumulative CPU 3326.19 sec
INFO  : 2017-01-11 14:13:47,133 Stage-1 map = 100%,  reduce = 15%, Cumulative CPU 3345.77 sec
INFO  : 2017-01-11 14:13:56,473 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 3370.6 sec
INFO  : 2017-01-11 14:14:02,696 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 3392.17 sec
INFO  : 2017-01-11 14:14:11,142 Stage-1 map = 100%,  reduce = 18%, Cumulative CPU 3416.55 sec
INFO  : 2017-01-11 14:14:17,365 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 3436.33 sec
INFO  : 2017-01-11 14:14:26,710 Stage-1 map = 100%,  reduce = 20%, Cumulative CPU 3458.24 sec
INFO  : 2017-01-11 14:14:32,926 Stage-1 map = 100%,  reduce = 21%, Cumulative CPU 3477.13 sec
INFO  : 2017-01-11 14:14:42,262 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 3504.41 sec
INFO  : 2017-01-11 14:14:48,485 Stage-1 map = 100%,  reduce = 23%, Cumulative CPU 3532.08 sec
INFO  : 2017-01-11 14:14:56,778 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 3561.15 sec
INFO  : 2017-01-11 14:15:02,997 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 3581.73 sec
INFO  : 2017-01-11 14:15:12,319 Stage-1 map = 100%,  reduce = 26%, Cumulative CPU 3607.59 sec
INFO  : 2017-01-11 14:15:18,543 Stage-1 map = 100%,  reduce = 27%, Cumulative CPU 3631.92 sec
INFO  : 2017-01-11 14:15:27,878 Stage-1 map = 100%,  reduce = 28%, Cumulative CPU 3655.75 sec
INFO  : 2017-01-11 14:15:34,094 Stage-1 map = 100%,  reduce = 29%, Cumulative CPU 3677.59 sec
INFO  : 2017-01-11 14:15:40,369 Stage-1 map = 100%,  reduce = 30%, Cumulative CPU 3703.33 sec
INFO  : 2017-01-11 14:15:49,687 Stage-1 map = 100%,  reduce = 31%, Cumulative CPU 3726.21 sec
INFO  : 2017-01-11 14:15:52,789 Stage-1 map = 100%,  reduce = 32%, Cumulative CPU 3730.8 sec
INFO  : 2017-01-11 14:15:57,964 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 3748.92 sec
INFO  : 2017-01-11 14:16:07,269 Stage-1 map = 100%,  reduce = 59%, Cumulative CPU 3765.56 sec
INFO  : 2017-01-11 14:16:10,377 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3769.25 sec
INFO  : 2017-01-11 14:17:10,449 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec
INFO  : 2017-01-11 14:18:10,459 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec
INFO  : 2017-01-11 14:19:10,521 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec
INFO  : 2017-01-11 14:20:10,569 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec
INFO  : 2017-01-11 14:21:11,569 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec
INFO  : 2017-01-11 14:22:12,566 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec
INFO  : 2017-01-11 14:23:12,587 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3804.54 sec

0 个答案:

没有答案
相关问题