用于在列中查找多个重复对的SQL查询

时间:2014-03-22 07:40:59

标签: sql postgresql

我有一张表格如下:

paper_id  author_id     author_name         author_affiliation
    1     521630         Ayman Kaheel        Cairo Microsoft Innovation Lab
    1     972575       Mahmoud Refaat       Cairo Microsoft Innovation Lab
    3    1528710     Ahmed Abdul-hamid      Harvard

现在,我发现了多对author_idauthor_nameauthor_affiliation。例如:

author_id     author_name     author_affiliation  count
   1          Masuo Fukui               <NA>       4
   4          Yasusada Yamada           <NA>       8

我使用以下查询:

statement<-"select author_id,author_name,author_affiliation,count(*)
        from paper_author 
        GROUP BY author_id,author_name,author_affiliation
        HAVING (COUNT(*)>1)" 

现在我想知道这里有多少author_ids。我这样做:

statement<-"select distinct author_id 
    from paper_author 
     where author_id in (
        select author_id,author_name,author_affiliation,count(*)
        from paper_author 
        GROUP BY author_id,author_name,author_affiliation
        HAVING (COUNT(*)>1)
    )" 

我无法获得理想的结果。

另外,如何获得上述结果中的纸张ID数量?

感谢。

3 个答案:

答案 0 :(得分:1)

我会这样做,我想:

statement<-"select distinct author_id 
    from paper_author 
     where author_id in (
        select author_id
        from paper_author 
        GROUP BY author_id,author_name,author_affiliation
        HAVING (COUNT(*)>1)
    )" 

答案 1 :(得分:0)

如果您只想知道有多少作者有多篇论文,请使用此查询:

SELECT COUNT(*) 
FROM (SELECT author_id, author_affiliation, COUNT(*)
      FROM paper_author 
      GROUP BY author_id, author_affiliation
      HAVING COUNT(*) > 1);

这假设author_idauthor_name的唯一标识符。如果id选择author_name, author_affiliation组合(即为不同机构制作论文的作者有多个ID,每个联盟一个),那么您也可以从子查询中点击author_affiliation

答案 2 :(得分:0)

这是您稍微重写的查询。您不需要IN子句。您可以直接从结果集中选择。

select distinct author_id 
from 
(
  select author_id
  from paper_author 
  group by author_id,author_name,author_affiliation
  having count(*) > 1
);