组内的总计数和子集数

时间:2017-03-21 09:15:45

标签: sql-server tsql

我有一张包含开始日期和结束日期的旅行表。有时旅行大多数时间不超过1天。以下是前10条记录的示例:

+------+-------------------------+-------------------------+
| id   | start_date              | end_date                |
+------+-------------------------+-------------------------+
| 7454 | 2013-09-01 01:01:00.000 | 2013-09-01 01:05:00.000 |
| 7457 | 2013-09-01 01:09:00.000 | 2013-09-01 01:12:00.000 |
| 7458 | 2013-09-01 02:01:00.000 | 2013-09-01 02:08:00.000 |
| 7459 | 2013-09-01 02:04:00.000 | 2013-09-01 02:23:00.000 |
| 7460 | 2013-09-01 02:04:00.000 | 2013-09-01 02:25:00.000 |
| 7461 | 2013-09-01 02:09:00.000 | 2013-09-01 02:12:00.000 |
| 7463 | 2013-09-01 02:19:00.000 | 2013-09-01 02:29:00.000 |
| 7465 | 2013-09-01 02:27:00.000 | 2013-09-01 02:29:00.000 |
| 7466 | 2013-09-01 04:06:00.000 | 2013-09-01 15:08:00.000 |
| 7467 | 2013-09-01 05:24:00.000 | 2013-09-01 05:37:00.000 |
+------+-------------------------+-------------------------+

我想要一个按开始月分组的查询 - 并给出每个月的总行程 - 以及大于1天的行程数。并计算出行百分比>天

以下是一行中所有行程的查询:

SELECT 
count ([id]) as NumTrips
,(
    SELECT count ([id]) as NumTripsGreaterThanOneDay
    FROM [trip]
    where dateadd(day,datediff(day,0,[start_date]),0) < dateadd(day,datediff(day,0,end_date),0)
) as 'NumTrips>Day'
,(
(
    SELECT count ([id]) as NumTripsGreaterThanOneDay
    FROM [trip]
    where dateadd(day,datediff(day,0,[start_date]),0) < dateadd(day,datediff(day,0,end_date),0)
)*100.0 / count ([id])
) as 'percent>day'
FROM [trip]

结果:

+----------+--------------+----------------+
| NumTrips | NumTrips>Day | percent>day    |
+----------+--------------+----------------+
| 669959   | 2099         | 0.313302754347 |
+----------+--------------+----------------+

我尝试使用此代码执行此操作:

SELECT 
count ([id]) as NumTrips
,(
    SELECT count ([id]) as NumTripsGreaterThanOneDay
    FROM [trip]
    where dateadd(day,datediff(day,0,[start_date]),0) < dateadd(day,datediff(day,0,end_date),0)
) as 'NumTrips>Day'
,dateadd(month,datediff(month,0,[start_date]),0) as TripMonth
FROM [trip]
group by dateadd(month,datediff(month,0,[start_date]),0)

得到了这个结果:

+----------+--------------+-------------------------+
| NumTrips | NumTrips>Day | TripMonth               |
+----------+--------------+-------------------------+
| 2102     | 2099         | 2013-08-01 00:00:00.000 |
| 25243    | 2099         | 2013-09-01 00:00:00.000 |
| 29105    | 2099         | 2013-10-01 00:00:00.000 |
| 24219    | 2099         | 2013-11-01 00:00:00.000 |
| 19894    | 2099         | 2013-12-01 00:00:00.000 |
| 24428    | 2099         | 2014-01-01 00:00:00.000 |
| 19024    | 2099         | 2014-02-01 00:00:00.000 |
+----------+--------------+-------------------------+

我看到问题是我的子查询在聚合内部没有相关性 - 但我无法弄清楚如何做到这一点。我想我可能需要用分区来做这件事 - 但我无法弄清楚

1 个答案:

答案 0 :(得分:2)

一种方法是使用条件聚合:

创建并填充样本表(在将来的问题中保存此步骤)

CREATE TABLE trip 
(
    id int, 
    start_date datetime,
    end_date datetime
)

INSERT INTO trip VALUES
(7454, '2013-09-01 01:01:00.000', '2013-09-01 01:05:00.000'),
(7457, '2013-09-01 01:09:00.000', '2013-09-01 01:12:00.000'),
(7458, '2013-09-01 02:01:00.000', '2013-09-01 02:08:00.000'),
(7459, '2013-09-01 02:04:00.000', '2013-09-01 02:23:00.000'),
(7460, '2013-09-01 02:04:00.000', '2013-09-01 02:25:00.000'),
(7461, '2013-09-01 02:09:00.000', '2013-09-01 02:12:00.000'),
(7463, '2013-09-01 02:19:00.000', '2013-09-01 02:29:00.000'),
(7465, '2013-09-01 02:27:00.000', '2013-09-01 02:29:00.000'),
(7466, '2013-09-01 04:06:00.000', '2013-09-02 15:08:00.000'),
(7467, '2013-09-01 05:24:00.000', '2013-09-02 05:37:00.000')

查询:

SELECT  DATEADD(MONTH,DATEDIFF(MONTH,0,[start_date]),0) As TripMonth,
        COUNT(id) As 'NumTrips',
        SUM
        (
            CASE WHEN DATEDIFF(DAY, Start_Date, End_Date) > 0 THEN 1 ELSE 0 END
        ) As 'NumTrips>Day'
FROM Trip
GROUP BY DATEADD(MONTH,DATEDIFF(MONTH,0,[start_date]),0)

结果:

TripMonth               NumTrips    NumTrips>Day
01.09.2013 00:00:00     10          2