使用唯一ID计算包含字符串的实例

时间:2017-10-26 20:35:54

标签: sql google-bigquery bigdata

我需要计算特定字符串出现的次数,但是当一个ID具有多次相同的字符串时,它们只计算一次。基本上,我需要计算ID唯一出现的字符串出现次数。我相信这应该是一件简单的事情,但我不知道自己在做什么。这是我目前的代码:

SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS` 
GROUP BY
ID,
Name
ORDER BY
Number

运行时,它表示所有内容都计为1.感谢您的帮助!

更新: 数据集:https://storage.googleapis.com/omnihealth/MepsPrescriptionData.csv

使用上面的代码运行时输出:

Row Name    ID  Number   
1   SUMATRIPTAN 68896102    1    
2   IBUPROFEN   65063102    1    
3   PENICILLN VK    66179101    1    
4   FUROSEMIDE  63217102    1    
5   HYSINGLA ER 70373101    1    
6   FUROSEMIDE  76090101    1    
7   SKELETAL MUSCLE RELAXANTS   78414101    1    
8   AMOXICILLIN 69467103    1    
9   TRAMADOL HCL    67667101    1    
10  PANTOPRAZOLE    60737102    1    
11  CARBAMIDE PEROXIDE 6.5% OTIC SOLN   63990104    1    
12  PROMETH/COD 68433101    1    
13  AZITHROMYCIN    79045102    1    
14  METRONIDAZOL    75414101    1    
15  DEXILANT    69625101    1    
16  TRAMADOL HCL    66890203    1    
17  AZITHROMYCIN    73838101    1    
18  COLCRYS 63856102    1    
19  PERMETHRIN  62103107    1    
20  ACETAMINOPHEN TAB 500 MG    62456102    1   

3 个答案:

答案 0 :(得分:1)

不确定这是否是您的要求 - 但如果您正在寻找DISTINCT COUNT - 请使用以下内容:

   
#standardSQL
SELECT
  RXNAME AS Name,
  COUNT(DISTINCT DUPERSID) AS Number
FROM `OmniHealth.PrescriptionsMEPS` 
GROUP BY 1
ORDER BY Number DESC

答案 1 :(得分:0)

试试这个......你在不同的领域进行分组。我认为你的意思是通过RXNAME分组。

SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS` 
GROUP BY
ID,
RXNAME
ORDER BY
Number

答案 2 :(得分:0)

我想你想要:

SELECT DUPERSID as ID, COUNT(DISTINCT RXNAME) as Number
FROM `OmniHealth.PrescriptionsMEPS` 
GROUP BY ID
ORDER BY Number;

这假设"相同的字符串"表示" RXNAME"的相同值。