What is wrong with my Group By statement?

时间:2019-01-07 13:51:56

标签: sql hive impala

I have a SQL Group By statement where I want to find the distinct substationcode and substationname with record count.

With the correct Group By, I should be able to see records that have count for distinct substationcode + substationname combination.
For example:

Source table:
substationcode substationname
ANDY           SUB:ANDY LAU
ANDY           SUB:CONS ANDY LAU
ACHM           SUB:ACHM
MIA            SUB:MIA LEONG
JON            SUB:JON LEE

Here are my codes:

proc sql;
create table twolayers as
select substationcode
,
substationname
,count(substationname) as cnt
from onlyscadadomsdistinct
group by substationcode, substationname
having cnt >1;
quit;

The result that I am hoping to get is that Andy will have cnt = 2. However, I see that ACHM has record cnt of 4. I don't get it. Which part of my group by statement is wrong?

I then filter substationcode "ACHM" to see the distinct substationname of "ACHM".
Only 1 record found which is SUB:ACHM

Where did ACHM CNT=4 comes from?

1 个答案:

答案 0 :(得分:0)

您仅应按以下步骤选择并按变电站代码分组:

选择变电站代码

,count(substationcode)as cnt 来自onlyscadadomsdistinct 按变电站代码分组 cnt> 1;