按分组时间计算个案

时间:2016-10-17 17:17:27

标签: sql count group-by hive case

我正在尝试按版本类型计算每周出现在我的数据中的不同ID,我不确定如何正确构建查询。

我希望按照以下方式制作一张表:

      1.1     1.2     1.3    1.4
wk1     1       5       4      8
wk2     4       3       9      8
wk3     1       8       0      6

我尝试制作下面的查询,但它不会运行,因为它需要group by中的Case语句,然后不会接受count()。

  SELECT
  Case  when version like "1.1%" then Count(distinct ID)
     when version like "1.2%" then Count(distinct ID)
     when version like "1.3%" then Count(distinct ID)
     when version like "1.4%" then Count(distinct ID) end,
  CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM db.table
  where timestamp_pst >=  "2016-01-28"
  group by CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT)  
        order by week_of_the_year

3 个答案:

答案 0 :(得分:1)

  SELECT
    COUNT(DISTINCT (CASE WHEN version like '1.1%' THEN ID END)) as '1.1'
    ,COUNT(DISTINCT (CASE WHEN version like '1.2%' THEN ID END)) as '1.2'
    ,COUNT(DISTINCT (CASE WHEN version like '1.3%' THEN ID END)) as '1.3'
    ,COUNT(DISTINCT (CASE WHEN version like '1.4%' THEN ID END)) as '1.4'
  CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM aws_d3.iaanalytics_detail
  where timestamp_pst >=  "2016-01-28"
  group by CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT)  
        order by week_of_the_year

您想要使用"条件聚合" 。这样做case语句实际上是在聚合函数内部。因为您想要COUNT(DISTINCT),您实际上需要通过利用聚合中的DISTINCT关键字或通过创建派生表来实现这一点,因此只有不同的值存在,因为另一个答案建议但是只有这样才能让你免于重复DISTINCT我没有看到使用派生表使问题复杂化的必要性。

请注意,SUM(CASE WHEN blah THEN 1 ELSE 0 END) NOT 为您工作,因为这会对所有出现次数求和,而不会计算不同的值。此外,聚合函数会忽略空值,如果不包含ELSE语句,案例表达式的值如果不匹配则为NULL

答案 1 :(得分:0)

您可以将COUNT()聚合函数与条件CASE语句一起使用。

SELECT
    week_of_the_year
  , COUNT(CASE WHEN version LIKE '1.1%' THEN id END) AS v1_1
  , COUNT(CASE WHEN version LIKE '1.2%' THEN id END) AS v1_2
  , COUNT(CASE WHEN version LIKE '1.3%' THEN id END) AS v1_3
  , COUNT(CASE WHEN version LIKE '1.4%' THEN id END) AS v1_4
FROM (
  SELECT
    DISTINCT
      id
    , version
    , CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM aws_d3.iaanalytics_detail
  where timestamp_pst >= '2016-01-28'
  ) t
GROUP BY week_of_the_year
ORDER BY week_of_the_year

请注意,查询的DISTINCT部分发生在派生表t中。实际上不需要派生表,但我发现它是一个更清晰的解决方案,因为GROUP BY子句不重复相同的代码并使其更具可读性。这也引入了不在聚合中完成的不同部分。

答案 2 :(得分:0)

试试这个

SELECT
  SUM(Case  when version like "1.1%" then 1 ELSE 0 END) as '1.1',
  SUM(Case  when version like "1.2%" then 1 ELSE 0 END) as '1.2',
  SUM(Case  when version like "1.3%" then 1 ELSE 0 END) as '1.3', 
  SUM(Case  when version like "1.4%" then 1 ELSE 0 END) as '1.4',
  CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT) as week_of_the_year
  FROM aws_d3.iaanalytics_detail
  where timestamp_pst >=  "2016-01-28"
  group by CAST(((datediff(timestamp_pst,'2016-01-03') / 7)+1) as INT)  
        order by week_of_the_year