SQL - 跨多行聚合

时间:2016-05-05 13:34:59

标签: mysql sql sqlite

MY_TABLE

我有下表,其中记录了驾驶员和车手的详细信息。对于每一天(datetime),有一个司机和零个或多个车手。如果有多个骑手,则对于每个骑手,数据(骑车人姓名和骑手年龄)将在具有相同datetime的新行中捕获。这可能不是构建数据的正确方法,但主要是由于每个日期时间每个驱动程序的车手数量不同

id    datetime    driver   age    riders   rider_name | rider_age
---|------------|--------|------|--------|------------|---
1  | 03/03/2009 | joe    | 24   | 0      |            | 
2  | 04/03/2009 | john   | 39   | 1      | juliet     | 30
3  | 05/03/2009 | borat  | 32   | 2      | jane       | 45
4  | 05/03/2009 |        |      |        | mike       | 18
5  | 06/03/2009 | john   | 39   | 3      | duke       | 42
6  | 06/03/2009 |        |      |        | jose       | 33
7  | 06/03/2009 |        |      |        | kyle       | 24

所需的输出

对于每个日期时间值,需要驾驶员,年龄,车手人数,最年轻骑手的姓名以及驾驶员在+/- 10年内的车手数量

 datetime    driver   age    riders   youngest_rider  riders_within_ten_years_of_driver
------------|--------|------|--------|--------------|---
 03/03/2009 | joe    | 24   | 0      |              | 0        # no rider
 04/03/2009 | john   | 39   | 1      | juliet       | 1        # juliet
 05/03/2009 | borat  | 32   | 2      | mike         | 0        # no rider
 06/03/2009 | john   | 39   | 3      | kyle         | 2        # duke, jose

4 个答案:

答案 0 :(得分:2)

这是一个非常糟糕的数据结构,因为驱动程序名称为空,因此您没有用于聚合的密钥。更正规化的结构更好,但有时我们会遇到特定的格式。

您需要获取每行的驱动程序记录的ID。为此,请使用相关子查询:

select r.*,
       (select max(r2.id)
        from riders r2
        where r2.id <= r.id and r2.driver is not null
       ) as driver_id
from riders r;

然后我们使用join来构建它以获取驱动程序信息和条件聚合。对于除了最小年龄的司机之外的所有事情:

select datetime,
       max(case when id = driver_id then driver end) as driver,
       max(case when id = driver_id then age end) as age,
       max(case when id = driver_id then riders end) as riders,
       sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years
from (select r.*,
             (select max(r2.id)
              from riders r2
              where r2.id <= r.id and r2.driver is not null
             ) as driver_id
      from riders r
     ) r
group by datetime, driver_id;

具有最小年龄的骑手对于这种数据结构非常棘手。一种解决方案是使用CTE:

with r as (
      select r.*,
             (select max(r2.id)
              from riders r2
              where r2.id <= r.id and r2.driver is not null
             ) as driver_id
      from riders r
     )
select datetime,
       max(case when id = driver_id then driver end) as driver,
       max(case when id = driver_id then age end) as age,
       max(case when id = driver_id then riders end) as riders,
       sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years,
       (select r2.rider_name
        from r r2
        where r2.driver_id = r.driver_id 
        order by r2.rider_age desc
        limit 1
       ) as minimum_age_rider
from r
group by datetime, driver_id;

这比它需要的要困难得多,因为(1)数据结构不是很好,(2)SQLite不是特别强大(特别是它不支持窗口函数)。

答案 1 :(得分:0)

如果您提供数据插入,我可以尝试此查询是否有效。

select datetime, driver, age, max(riders)
,max(first_value(rider_name) over (partition by datetime, driver, age order by rider_age, rider_name)) youngest_rider
, count (case when rider_age between age -10 and age + 10
        then 1
        else 0
        end
) count_riders_in_age_grp
from table 
group by datetime, driver, age

答案 2 :(得分:0)

这是一个糟糕的数据库结构,但我认为它是一个家庭作业问题。无论如何,这应该有效:

SELECT  [DateTime], 
        MAX(driver) AS [Driver], 
        MAX(AGE) AS [Age], 
        MAX(riders) AS [Riders],
        t.rider_name AS [Youngest Rider],
        ISNULL(SUM(CASE WHEN rider_age BETWEEN MAX(AGE)- 10 AND MAX(AGE) + 10 THEN 1 ELSE 0 END), 0) AS [Riders within Ten Years of Driver]
FROM my_table M
CROSS APPLY
    (
        SELECT rider_name
        FROM my_table
        WHERE DateTime = M.DateTime
        AND rider_age = (SELECT MIN(rider_age) FROM my_table WHERE DateTime = M.DateTime)
    ) t
GROUP BY M.DateTime, t.rider_name

答案 3 :(得分:0)

SELECT
    datetime
    ,max(driver) as driver
    ,max(age) as age
    ,max(riders) as riders
    ,first_value(rider_name) OVER
        (PARTITION BY datetime
        ORDER BY rider_age
        rows unbounded preceding)
        as youngest_rider
    ,count(b.id) as riders_within_ten_years_of_driver
FROM
    my_table a
LEFT JOIN
    my_table b
    ON
        a.datetime = b.datetime
        AND a.age - b.rider_age between -10 AND 10
GROUP BY
    datetime
    ,youngest_rider

这是一团糟。如果你有一张司机,骑手和游乐设施的桌子会更简单。