根据列和具有条件的单独分组计数

时间:2017-04-20 22:17:55

标签: sql count hive

我正在尝试将三个单独的查询合并为一个,并且仍然生成相同的结果,但作为单个表。 ColumnA和ColumnB实际上都是'yyyy-mm-dd'的日期格式,理想情况下,最终结果只是一列日期和每个查询的单独计数。

select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA

select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB

select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB

4 个答案:

答案 0 :(得分:1)

使用UNION ALL

select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA
UNION ALL
select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB
UNION ALL
select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB

答案 1 :(得分:1)

以下查询表达您要执行的操作:

select d.dte, coalesce(a.cnt, 0) as acnt, coalesce(b.cnt, 0) as bcnt,
       b.c_cnt
from (select columnA as dte from data.table where timestamp between '2017-01-01' and '2017-01-07'

      union
      select columnB from data.table where timestamp between '2017-01-01' and '2017-01-07'
     ) d left join
     (select columnA, count(*) as cnt
      from data.table
      where timestamp between '2017-01-01' and '2017-01-07'
      group by columnA
     ) a
     on d.dte = a.columnA left join
     (select columnB, count(*) as cnt,
             count(distinct case when columnX in ('itemA','ItemB') then columnC end) as c_cnt
      from data.table
      where timestamp between '2017-01-01' and '2017-01-07'
      group by columnB
     ) b
     on d.dte = b.columnB;

我认为这与Hive兼容,但偶尔Hive与SQL的其他方言有惊人的偏差。

答案 2 :(得分:1)

以下似乎是你想要的:

select columnA, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnA
Union All
select columnB, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnB
Union All
select columnB, count(distinct columnC) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' and columnX in ('itemA','ItemB') group by columnB

答案 3 :(得分:0)

我能够使用以下方法让它工作:

With pullA as
(
  select columnA, count(*) as A_count
  from data.table
  group by columnA
),
pullB as
(
  select columnB, count(*) as B_count
  from data.table
  group by columnB
),

pullC as
(
  select columnB , count(*) as C_count
  from data.table
  where columnX in ('itemA', 'itemB')
  group by columnB
)

select ColumnB, A_count, B_count, C_count
from pullB
left join pullA
on ColumnB = ColumnA
left join pullC
on ColumnB = ColumnC

这种方法是否比联合或子查询方法更有效或更低效?