左连接和总和和分组 - 奇怪的行为

时间:2014-09-10 01:36:22

标签: sql sql-server

我有这两个表:

student attendance table - student_id, campus_section
campus table -  campus_section, number_of_students, campus_name 

示例数据:

Student Table: 
Student_Id, campus_section
1, ddr1
2, ddr1
3, ddr2 
4, ddr3
5, ddv1
6, ddv2
7, ddv6

Campus Table
Campus_Section, Number_Of_Students, Campus_Name
ddr1, 10, ddr
ddr2, 5, ddr
ddr3, 5, ddr
ddv1, 5, ddv
ddv2, 10, ddv
ddv3, 10, ddv
ddv6, 10, ddv

所以预期的行将是

Campus, current_students, campus_students    
ddr, 4, 20
ddv, 3, 35

每个campus_name可以有多个campus_section行。以下查询列出了校园名称以及该校区内的学生人数以及该校区内的学生总数。

select d.[campus_name] as campus_name, 
       cast(count(s.student_id) as int) as current_students, 
       sum(cast (d.[number_of_students] as int)) as campus_students 
from campus d 
left join student s 
on s.campus_section = d.campus_section  
group by d.[campus_name]

对于某些campus_name,section_students列中的结果大于此:

select d.[campus_name] as campus_name, 
           sum(cast (d.[number_of_students] as int)) as section_students 
    from campus d 
    group by d.[campus_name]  

这意味着左连接正在做某些不应该对某些行进行的操作。或者第二个查询可能不正确。

编辑:例如,第一个查询将为某个校园名称提供18,而第二个查询将给出10。

有人可以了解发生的事情吗?它是sql server 2008.

2 个答案:

答案 0 :(得分:0)

使用左连接,sum(cast (d.[number_of_students] as int))将是校园中的number_of_students * number_of_students。摆脱group by陈述,你会发现原因。

select d.[campus_name] as campus_name, 
       s.student_id, 
       d.[number_of_students]
from campus d 
left join student s 
on s.campus_section = d.campus_section 

所以正确的方法是:

select d.[campus_name] as campus_name, 
       cast(count(s.student_id) as int) as current_students, 
       cast (d.[number_of_students] as int) as section_students 
from campus d 
left join student s 
on s.campus_section = d.campus_section  
group by d.[campus_name],cast (d.[number_of_students] as int)

<强>更新 根据后来发布的数据OP,需要先按校园分组获取section_students数字的总和,然后通过campus_name连接学生表和组以获取current_students数字。

with campus_t as (select d.[campus_name] as campus_name, 
       sum(cast (d.[number_of_students] as int)) as campus_students 
     from campus d
     group by d.[campus_name])
select d.campus_name,
       d.campus_students,
       cast(count(s.student_id) as int) as current_students,
from campus_t d 
left join campus section
on section.campus_name = d.campus_name
left join student s 
on s.campus_section = section.campus_section  
group by d.campus_name,d.campus_students

注意:尚未测试。请检查它。

您的问题来自非规范化设计。校园表应分为两个campus和campus_section表。这就是为什么我必须添加一个名为campus_t的CTE表来获取校园实体的信息。您原来的校园表数据代表campus_section实体。如果将模型规范化为三个表,则应该更容易查询。

答案 1 :(得分:0)

部分可以有多个学生。因此,当表格加入时,Campus中的行可能会重复,如果您尝试一次性汇总所有内容,则会导致结果出现偏差。

因此,请尝试分两步进行:首先按照校园部分对学生进行计数:

SELECT
  campus_section,
  number_of_students,
  campus_name,
  COUNT(s.student_id)
FROM
  dbo.Campus AS c
LEFT JOIN
  dbo.Student AS s
ON
  c.campus_section = s.campus_section
GROUP BY
  campus_section,
  number_of_students,
  campus_name

然后汇总每个校区的部分结果:

SELECT
  campus_name,
  current_campus  = SUM(current_section),
  campus_students = SUM(number_of_students)
FROM
  (
    SELECT
      c.campus_section,
      c.number_of_students,
      c.campus_name,
      current_section = COUNT(s.student_id)
    FROM
      dbo.Campus AS c
    LEFT JOIN
      dbo.Student AS s
    ON
      c.campus_section = s.campus_section
    GROUP BY
      campus_section,
      number_of_students,
      campus_name
  ) AS sub
GROUP BY
  campus_name
;