SQL计算一段时间内的累计值

时间:2020-04-11 03:17:49

标签: mysql sql group-by

我正在尝试计算自2020年1月1日以来的累计收入。我具有以下模式的用户级收入数据

create table revenue
(
  game_id        varchar(255),
  user_id        varchar(255),
  amount         int,
  activity_date  varchar(255)
);

insert into revenue
  (game_id, user_id, amount, activity_date)
values
  ('Racing', 'ABC123', 5, '2020-01-01'),
  ('Racing', 'ABC123', 1, '2020-01-04'),
  ('Racing', 'CDE123', 1, '2020-01-04'),
  ('DH', 'CDE123', 100, '2020-01-03'),
  ('DH', 'CDE456', 10, '2020-01-02'),
  ('DH', 'CDE789', 5, '2020-01-02'),
  ('DH', 'CDE456', 1, '2020-01-03'),
  ('DH', 'CDE456', 1, '2020-01-03');

预期产量

Game    Age    Cum_rev    Total_unique_payers_per_game
Racing  0      5          2
Racing  1      5          2
Racing  2      5          2
Racing  3      7          2
DH      0      0          3
DH      1      15         3
DH      2      117        3
DH      3      117        3

年龄是根据交易日期与2020-01-01之间的差额计算的。 我正在使用以下逻辑

SELECT game_id, DATEDIFF(activity_date ,'2020-01-01') as Age,count(user_id) as Total_unique_payers
from REVENUE

SQL fiddle 如何计算累计收入?

2 个答案:

答案 0 :(得分:1)

对于以下情况,您需要一个支持INNER JOIN子句的MySQL版本(MySQL 8+)-我在下面使用了MariaDB 10.4(我尝试时MySQL 8在该站点上不起作用)

over()
✓

✓
create table revenue
(
  game_id        varchar(255),
  user_id        varchar(255),
  amount         int,
  activity_date  varchar(255)
);

insert into revenue
  (game_id, user_id, amount, activity_date)
values
  ('Racing', 'ABC123', 5, '2020-01-01'),
  ('Racing', 'ABC123', 1, '2020-01-04'),
  ('Racing', 'CDE123', 1, '2020-01-04'),
  ('DH', 'CDE123', 100, '2020-01-03'),
  ('DH', 'CDE456', 10, '2020-01-02'),
  ('DH', 'CDE789', 5, '2020-01-02'),
  ('DH', 'CDE456', 1, '2020-01-03'),
  ('DH', 'CDE456', 1, '2020-01-03');
  
  
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers
:------ | :------ | :------------ | -----: | ----------: | ------------------:
Racing  | ABC123  | 2020-01-01    |      5 |           5 |                   4
DH      | CDE456  | 2020-01-02    |     10 |          15 |                   4
DH      | CDE789  | 2020-01-02    |      5 |          20 |                   4
DH      | CDE123  | 2020-01-03    |    100 |         120 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         122 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         122 |                   4
Racing  | ABC123  | 2020-01-04    |      1 |         123 |                   4
Racing  | CDE123  | 2020-01-04    |      1 |         124 |                   4

db <>提琴here

更改over子句中的计算顺序会影响运行总和的计算方式:例如

SELECT
  game_id
, user_id
, activity_date
, amount
, sum(amount) over(order by activity_date, user_id) as running_sum
, (select count(distinct user_id) from revenue) as Total_unique_payers
from revenue
order by
  activity_date
, user_id
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers
:------ | :------ | :------------ | -----: | ----------: | ------------------:
Racing  | ABC123  | 2020-01-01    |      5 |           5 |                   4
Racing  | ABC123  | 2020-01-04    |      1 |           6 |                   4
Racing  | CDE123  | 2020-01-04    |      1 |           7 |                   4
DH      | CDE456  | 2020-01-02    |     10 |          17 |                   4
DH      | CDE789  | 2020-01-02    |      5 |          22 |                   4
DH      | CDE123  | 2020-01-03    |    100 |         122 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         124 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         124 |                   4

db <>提琴here

答案 1 :(得分:1)

使用MySQL 5.7的唯一方法是使用它的变量系统,尽管它起作用了。它模拟了@Used_By_Already在其answer

上使用的窗口函数

由于您提到要关注差距,因此需要首先创建日期表,该操作很容易做到:

create table dates_view (
  date_day date
);

insert into dates_view
select date_add( '2019-12-31', INTERVAL @rownum:=@rownum+1 day ) as date_day
from (
   select 0 union select 1 union select 2 union select 3 
   union select 4 union select 5 union select 6 
   union select 7 union select 8 union select 9
) a, (
   select 0 union select 1 union select 2 union select 3 
   union select 4 union select 5 union select 6 
   union select 7 union select 8 union select 9
) b, (select @rownum:=0) r;

-- Note: each set of select union above will multiply the number 
-- of days by 10, so if you need more days in your table just add more
-- set as above "a" or "b" sets

在拥有日期表之后,您必须将其与当前的revenue表交叉连接,因为您希望玩家数量与累积的amount独立,因此您需要独立地对其进行计算在子查询中。

您还需要计算max(activity_date)表的revenue,以便将结果限制到表中。

因此,下面的查询将仅根据您当前的样本数据来执行此操作:

set @_sum:=0;       -- Note: this two lines depends on the client
set @_currGame:=''; -- you are using. Some accumulate variable per session
                    -- some doesn't, below site, for instance does

select a.game_id,
       a.age,
       case when @_currGame = game_id 
            then @_sum:=coalesce(samount,0) + @_sum
            else @_sum:=coalesce(samount,0) end as Cum_rev,
       a.Total_unique_payers_per_game,
       @_currGame := game_id varComputeCurrGame
from 
    (
    select players.game_id, 
           rev.samount,
           datediff(dv.date_day, '2020-01-01') age,
           players.noPlayers Total_unique_payers_per_game
       from (select @_sum:=0) am,
            dates_view dv
             cross join (select max(activity_date) maxDate from revenue) md 
               on dv.date_day <= md.maxDate
             cross join (select game_id, count(distinct user_id) noPlayers 
                           from revenue group by game_id) players
             left join (select game_id, activity_date, sum(amount) samount 
                          from revenue group by game_id, activity_date) rev
                on players.game_id = rev.game_id
                   and dv.date_day = rev.activity_date
    ) a,
    (select @_sum:=0) s,
    (select @_currGame='') x
order by a.game_id desc, a.age;

这将导致:

  game_id   age  Cum_rev  Total_unique_payers_per_game   varComputeCurrGame
   Racing    0      5             2                            Racing
   Racing    1      5             2                            Racing
   Racing    2      5             2                            Racing
   Racing    3      7             2                            Racing
   DH        0      0             3                            DH    
   DH        1      15            3                            DH    
   DH        2      117           3                            DH    
   DH        3      117           3                            DH  

看到它在这里工作(您需要运行它):https://www.db-fiddle.com/f/qifZ6hmpvcSZYwhLDv613d/2

这是MySQL 8.x的版本,它支持窗口功能:

select distinct agetable.game_id,
       agetable.age,
       sum(coalesce(r1.amount,0)) 
             over (partition by agetable.game_id 
                     order by agetable.game_id, agetable.age) as sm,
       agetable.ttplayers
from
    (
    select r.game_id, dv.date_day, datediff(dv.date_day, '2020-01-01') age, p.ttplayers
    from dates_view dv
          cross join (select distinct game_id, activity_date from revenue) r 
            on dv.date_day <= (select max(activity_date) from revenue)
          left join (select game_id, count(distinct user_id) ttplayers from revenue group by game_id) p
            on r.game_id = p.game_id
    group by r.game_id desc, dv.date_day, age, p.ttplayers
    ) agetable
    left join revenue r1
      on agetable.date_day = r1.activity_date
         and r1.game_id = agetable.game_id
order by agetable.game_id desc, agetable.age