Question

我有一个患有VITALS表的患者数据库。此表包含每位患者的唯一patient ID (PATID)和height variable (HT)。单个患者可能记录>1身高。

我尝试在高度范围PATIDs内和高度范围内返回唯一(e.g., 68-72", 72-76", etc.)的计数。每个PATID都应计为*only once*。然而，我发现如果患者记录了多个高度，他们将在一个范围内计数一次，但如果他们的身高超过范围，他们将被计算两次 - 每个范围一次

例如，如果患者的身高记录为68,72和73，则他们将在68-72范围内计数一次，在72-76范围内计数一次。我可以说这是因为我们有3054个唯一的PATID，但查询返回的计数总和是> 5000。

我的代码是：

SELECT 
    CASE
        when "HT" >0 and "HT" <=4 then '0-4'
        when "HT" >4 and "HT" <=8 then '4-8'
        when "HT" >8 and "HT" <=12 then '8-12'
        when "HT" >12 and "HT" <=16 then '12-16'
        when "HT" >16 and "HT" <=20 then '16-20'
        when "HT" >20 and "HT" <=24 then '29-24'
        when "HT" >24 and "HT" <=28 then '24-28'
        when "HT" >28 and "HT" <=32 then '28-32'
        when "HT" >32 and "HT" <=36 then '32-36'
        when "HT" >36 and "HT" <=40 then '36-40'
        when "HT" >40 and "HT" <=44 then '40-44'
        when "HT" >44 and "HT" <=48 then '44-48'
        when "HT" >48 and "HT" <=52 then '48-52'
        when "HT" >52 and "HT" <=56 then '52-56'
        when "HT" >56 and "HT" <=60 then '56-60'
        when "HT" >60 and "HT" <=64 then '60-64'
        when "HT" >64 and "HT" <=68 then '64-68'
        when "HT" >68 and "HT" <=72 then '68-72'
        when "HT" >72 and "HT" <=76 then '72-76'
        when "HT" >76 and "HT" <=80 then '76-80'
        when "HT" >80 and "HT" <=84 then '80-84'
        when "HT" >84 and "HT" <=88 then '84-88'
        when "HT" IS NULL then 'Null'
        else '>88'    
    END AS "Height Range",            
    COUNT(DISTINCT vital."PATID") AS "Count"
FROM dbo."VITAL" vital
GROUP BY 1;

Answer 1

如果患者有多个记录，您必须选择所需的记录。

一种解决方案是将源更改为仅获取最大高度：

Public Class Form1

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles Me.Load
    Dim frm2 As New Form2
    frm2.TopLevel = False
    frm2.FormBorderStyle = Windows.Forms.FormBorderStyle.None
    Me.Panel1.Controls.Add(frm2)
    frm2.Show()

End Sub

End Class

或许你可以采取最低或平均的记录 - 适当的解决方案取决于你的要求。

Answer 2

在进行计数之前，您可以在子查询中折叠重复项：

SELECT CASE WHEN "HT" IS NULL THEN 'Null' WHEN "HT" <= 4 THEN '0-4' WHEN "HT" <= 8 THEN '4-8' WHEN "HT" <= 12 THEN '8-12' WHEN "HT" <= 16 THEN '12-16' WHEN "HT" <= 20 THEN '16-20' WHEN "HT" <= 24 THEN '29-24' WHEN "HT" <= 28 THEN '24-28' WHEN "HT" <= 32 THEN '28-32' WHEN "HT" <= 36 THEN '32-36' WHEN "HT" <= 40 THEN '36-40' WHEN "HT" <= 44 THEN '40-44' WHEN "HT" <= 48 THEN '44-48' WHEN "HT" <= 52 THEN '48-52' WHEN "HT" <= 56 THEN '52-56' WHEN "HT" <= 60 THEN '56-60' WHEN "HT" <= 64 THEN '60-64' WHEN "HT" <= 68 THEN '64-68' WHEN "HT" <= 72 THEN '68-72' WHEN "HT" <= 76 THEN '72-76' WHEN "HT" <= 80 THEN '76-80' WHEN "HT" <= 84 THEN '80-84' WHEN "HT" <= 88 THEN '84-88' ELSE '>88' END AS "Height Range", count(*) AS "Count" -- DISTINCT not needed any more FROM ( SELECT DISTINCT ON ("PATID") -- get greatest "HT" per patient "PATID", "HT" FROM dbo."VITAL" ORDER BY "PATID", "HT" DESC NULLS LAST ) sub GROUP BY 1;

我还从您的CASE声明中删除了多余的检查 - 假设不可能出现负高度（您应该有CHECK约束。）

DISTINCT ON的详细说明：

Select first row in each GROUP BY group?

或者在子查询中使用聚合，如@jpw suggested。

在案例

2 个答案: