如何消除跨数据库连接表的重复行?

时间:2013-05-08 15:09:06

标签: sql sql-server subquery outer-join

我一直在研究这个剧本已经走到了尽头。该脚本有效,但不幸的是会产生重复。我的脚本在state_issue_teacher_id键上跨数据库连接两个不同的表,然后生成输出。我检查了两个表并且行数相同,并且连接应该完全匹配记录,但显然我的密钥或我加入表的方式有问题,我的输出回来部分不正确。我还尝试连接属性来创建一个唯一的键并加入表但仍然产生不正确的结果。

这是我的剧本:

SELECT     
       LTRIM(RTRIM(rt.year_time)) AS 'year_time' ,
       LTRIM(RTRIM(rt.state_issue_teacher_id)) AS state_issue_teacher_id ,
       LTRIM(RTRIM(rt.district_code)) AS district_code ,
       rt.district_name ,
       rt.school_name ,
       LTRIM(RTRIM(rt.assignment_code)) AS assignment_code ,
       rt.assignment_desc ,
       LTRIM(RTRIM(rt.position_code)) AS position_code ,
       rt.position_desc ,
       LTRIM(RTRIM(rt.last_name)) AS last_name ,
       LTRIM(RTRIM(rt.first_name)) AS first_name ,
       LTRIM(RTRIM(rt.total_salary)) AS total_salary ,
       rt.assign_fte ,
       LTRIM(RTRIM(rt.school_code)) AS school_code ,
       rt.fte

    FROM    staging.dbo.rt AS rt

    LEFT JOIN ( SELECT   LTRIM(RTRIM(dti.year)) AS year ,
                    LTRIM(RTRIM(dt.teacher_id)) AS teacher_id ,
                    LTRIM(RTRIM(db.district_code)) AS district_code ,
                    db.district_name ,
                    LTRIM(RTRIM(dt.last_name)) AS last_name ,
                    LTRIM(RTRIM(dt.first_name)) AS first_name ,
                    LTRIM(RTRIM(da.assignment_code)) AS assignment_code ,
                    LTRIM(RTRIM(dp.position_code)) AS position_code ,
                    dre.race_ethnicity_code ,
                    LTRIM(RTRIM(SUBSTRING(db.school_code,10,4))) AS school_code ,
                    da.assignment_desc ,
                    dp.position_desc ,
                    fs.total_fte

           FROM     mart.dbo.fact_s AS fs
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_building
                    AS db ON fs.building_key = db.building_key
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_teacher
                    AS dt ON fs.teacher_key = dt.teacher_key
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_assignment
                    AS da ON fs.assignment_key = da.assignment_key
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_race_ethnicity
                    AS dre ON dt.race_ethnicity_key = dre.race_ethnicity_key
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_gender
                    AS dg ON dt.gender_key = dg.gender_key
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_time
                    AS dti ON fs.time_key = dti.time_key
                    LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_position
                    AS dp ON fs.position_key = dp.position_key
           WHERE    dti.year = '2012'



         ) raw ON    rt.state_issue_teacher_id = raw.teacher_id                 
                        AND rt.year_time = raw.year 
                        AND rt.last_name = raw.last_name 
                        AND rt.first_name = raw.first_name 
                        AND rt.district_code = raw.district_code
                        AND rt.position_code = raw.position_code
                        AND rt.school_code = RAW.school_code
                        AND rt.assignment_code = raw.assignment_code



    WHERE   rt.year_time = '2012'



    ORDER BY rt.last_name, rt.first_name

我得到的输出是: enter image description here

合并教师作业的fte应加起来为1.但具有相同assignment_code / desc并且具有多个部分作业的教师正在产生重复。示例:Jane Doe出现4次,总fte为2.0而不是2次,正确的总数为1.0。输出应如下所示。 enter image description here

1 个答案:

答案 0 :(得分:1)

您似乎正在为具有多个作业的兼职教师获取重复项,并且所有作业的描述都相同。从实际输出的前四行与所需输出的前两行相比,这一点非常清楚。

我想知道为什么你会有这些重复的开头。然而,他们在事实表中,所以必须有一些重要的东西(我认为两个兼职指导顾问是资助而不是一个全职的辅导员)。在这种情况下,事实表是否确实具有完全重复的记录?如果没有,那么不重复的字段可能会建议一个可以解决问题的附加连接键。

您需要摆脱此加入条件产生的笛卡尔积:rt.assignment_code = raw.assignment_code

除了找到更好的连接键之外,我还可以想出两种方法来解决这个问题。第一个是为职位创建一个真正独特的ID。也许在您的数据结构中,您知道一个。或者,您可以使用row_number()为有多个职位的人添加序列号。

另一种方法是消除一方或另一方的重复。例如,您可以汇总rt以消除此类重复项。

相关问题