(SQL)如何为每个组选择正确的行?

时间:2017-02-13 06:16:44

标签: sql sql-server group-by

我有一条数据:

+------------+-----------+-----------+------------+--------------+
| first_name | last_name | family_id | is_primary | is_secondary |
+------------+-----------+-----------+------------+--------------+
| a          | b         |         1 |          1 |            0 |
| aa         | bb        |         1 |          0 |            0 |
| c          | d         |         1 |          0 |            0 |
| cc         | dd        |         1 |          0 |            0 |
| e          | f         |        10 |          0 |            0 |
| e          | f         |        10 |          0 |            1 |
| gg         | hh        |        10 |          0 |            1 |
| gg         | hh        |        10 |          0 |            0 |
| gg         | hh        |        10 |          0 |            0 |
| gg         | hh        |        10 |          0 |            0 |
+------------+-----------+-----------+------------+--------------+

我想做的是:

  • 按family_id分组(所以我们将有两个小组)
  • 对于每个组,如果某些行的is_primary等于1,则选择它们的随机行并将其first_name和last_name作为该组的两列的输出
  • 对于每个组,如果没有is_primary等于1的行,找到is_secondary等于1的行(任何行都可以),并将其first_name和last_name作为输出该小组的两栏

因此,基于上述逻辑和数据,正确的结果应该是:

+-----------+------------+-----------+
| family_id | first_name | last_name |
+-----------+------------+-----------+
|         1 | a          | b         |
|        10 | e          | f         |
+-----------+------------+-----------+

或者

+-----------+------------+-----------+
| family_id | first_name | last_name |
+-----------+------------+-----------+
|         1 | a          | b         |
|        10 | gg         | hh        |
+-----------+------------+-----------+

如何编写查询以获得正确的结果?

以下是创建测试表的脚本。

USE tempdb
GO
IF OBJECT_ID('dbo.mytable') IS NOT NULL DROP TABLE dbo.mytable;
CREATE TABLE mytable (
    first_name   VARCHAR(2) NOT NULL,
    last_name    VARCHAR(2) NOT NULL,
    family_id    INTEGER    NOT NULL,
    is_primary   INTEGER    NOT NULL,
    is_secondary INTEGER    NOT NULL);

INSERT INTO mytable VALUES ('a','b',1,1,0);
INSERT INTO mytable VALUES ('aa','bb',1,0,0);
INSERT INTO mytable VALUES ('c','d',1,0,0);
INSERT INTO mytable VALUES ('cc','dd',1,0,0);
INSERT INTO mytable VALUES ('e','f',10,0,0);
INSERT INTO mytable VALUES ('e','f',10,0,1);
INSERT INTO mytable VALUES ('gg','hh',10,0,1);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
GO

SELECT * FROM dbo.mytable;

3 个答案:

答案 0 :(得分:2)

尝试这种方法:

;with x as (
    select *, row_number() over(partition by family_id order by is_primary desc, is_secondary desc) rn
    from mytable
    where is_primary+is_secondary = 1
)
select * from x where rn = 1

(感谢创建和插入脚本)

编辑: 根据OP注释(两个标志都可以是1),将WHERE子句更改为:

where is_primary = 1 or (is_primary = 0 and is_secondary = 1)

答案 1 :(得分:1)

如果选择的行必须是随机的,请使用以下内容:

WITH primary_families AS (
    SELECT   family_id
            ,first_name
            ,last_name
            ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
    FROM familytable
    WHERE is_primary = 1
),
secondary_families AS (
    SELECT   family_id
            ,first_name
            ,last_name
            ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
    FROM familytable f
    WHERE is_secondary = 1
    AND NOT EXISTS (
        SELECT 1
        FROM familytable
        WHERE family_id = f.family_id
        AND is_primary = 1
    )
)

SELECT   f.family_id
        ,f.first_name
        ,f.last_name
FROM primary_families f
WHERE f.r = 1

UNION

SELECT   f.family_id
        ,f.first_name
        ,f.last_name
FROM secondary_families f
WHERE f.r = 1

答案 2 :(得分:0)

这不是你的具体问题的答案,只是一个观察。如果我必须开发具有这种逻辑的软件或Web应用程序,我会将其从SQL移动到可用的编程语言。检索感兴趣的数据集,扫描它,分组并排序。