如何计算日平均频率?

时间:2017-07-17 19:52:30

标签: sql hive

我有这张表my_table

recorder_id    person_id     day
A1             1             2017-06-03 12:30
A1             1             2017-06-03 12:45
B1             1             2017-06-03 12:50
A1             2             2017-06-03 16:40
B1             2             2017-06-03 16:45
B1             2             2017-06-03 18:20
A1             1             2017-06-04 11:22

我想知道每个人平均每天经过多少次。例如,身份1的人平均每天经过记录器A1 1.5次,而2人平均每天经过0.5次记录(因为此人没有2017-06-04的记录)。应该对B1应用相同的逻辑。

recorder_id   person_id   daily_average_per_person
A1            1           1.5 
A1            2           0.5
B1            1           0.5
B1            2           1.0 

我怎样才能得到这个结果?

我尝试了这个查询,但我不知道如何计算每位唯一身份的每日平均值:

SELECT recorder_id, person_id,
       to_date(day) as hour,
       count(*) as hourly_count

FROM        my_table

GROUP BY    recorder_id, person_id, to_date(day)

ORDER BY    hourly_count;

2 个答案:

答案 0 :(得分:3)

你真的很亲密。我将使用这个子选项:

SELECT recorder_id, person_id, avg(day_count) day_avg
  FROM
       ( SELECT recorder_id, person_id,
                to_date(day) as record_day,
                count(*) as day_count
           FROM my_table
          GROUP BY recorder_id, person_id, to_date(day) ) tmp_tbl
 GROUP BY recorder_id, person_id
 ORDER BY avg(day_count);

我道歉,我不在我可以测试它的地方,但它应该让你走上正确的道路。

祝你好运!

答案 1 :(得分:1)

如果我理解正确,您只需要数据中的天数。这成为分母:

SELECT recorder_id, person_id,
       count(*) / numdays
FROM t CROSS JOIN
     (SELECT COUNT(DISTINCT to_date(day)) as numdays
      FROM t
     ) tt
GROUP BY recorder_id, person_id, numdays
ORDER BY recorder_id, person_id;

在其他数据库中,您可以使用COUNT(DISTINCT)作为窗口函数。我不认为Hive支持这一点。