从连接表查询中仅选择重复项

时间:2016-04-01 17:00:53

标签: sql sql-server group-by duplicates large-data

我有以下查询,我正在尝试连接两个匹配其ID的表,以便我可以在“c.code”中获取重复的值。我尝试了很多查询,但没有任何效果。我的数据库中有500k行,使用此查询我只能获得5k,这是不对的。我肯定它至少200K。我也尝试过使用Excel,但它处理得太多了。 有任何想法吗? 提前谢谢大家。

SELECT c.code, c.name as SCT_Name, t.name as SYNONYM_Name, count(c.code)
FROM database.Terms as t
  join database.dbo.Concepts as c on c.ConceptId = t.ConceptId
  where t.TermTypeCode = 'SYNONYM' and t.ConceptTypeCode = 'NAME_Code' and c.retired = '0'
   Group by c.code, c.name, t.name
   HAVING COUNT(c.code) > = 1

Order by c.code

3 个答案:

答案 0 :(得分:1)

with data as (
    select c.code, c.name as SCT_Name, t.name as SYNONYM_Name
    from database.Terms as t inner join database.dbo.Concepts as c
        on c.ConceptId = t.ConceptId
    where
            t.TermTypeCode = 'SYNONYM'
        and t.ConceptTypeCode = 'NAME_Code'
        and c.retired = '0'
)
select *
    --, (select count(*) from data as d2 where d2.code = data.code) as code_count
    --, count(*) over (partition by code) as code_count
from data
where code in (select code from data group by code having count(*) > 1)
order by code

答案 1 :(得分:0)

如果您只想复制c.code,那么您的Group By是错误的(您的Having子句也是如此)。试试这个:

SELECT c.code
FROM database.Terms as t
  join database.dbo.Concepts as c on c.ConceptId = t.ConceptId
  where t.TermTypeCode = 'SYNONYM' and t.ConceptTypeCode = 'NAME_Code' and c.retired = '0'
   Group by c.code
   HAVING COUNT(c.code) > 1

这将返回您拥有多个c.code值的所有行。

答案 2 :(得分:0)

您需要使用INTERSECT而不是JOIN。基本上,您在第一个表上执行选择,然后与第二个表相交。结果是重复的行。

但是,仅选择id列,否则交叉将无法按预期工作。

相关问题