谷歌bigquery内部联接不按预期行事

时间:2017-07-17 00:56:21

标签: sql google-bigquery

我有一个名为T的表,结构如下:

Row date    a   b   c   d   e   f   g    
1   2.0051012E7 4.0 5.0 1.0 0.9 21.0    2.0170716E7 0.8
2   2.0131101E7 1.0 5.0 0.0 1.0 21.0    2.0170716E7 0.6  
3   2.0060908E7 3.0 5.0 0.0 1.0 21.0    2.0170716E7 0.7

我有以下查询:

select * from 
(SELECT date,max(a) as w FROM [T] 
group by date
order by date asc) as tableA
 inner  join
(select date,b from 
 [T]   ) as tableB
on tableB.date=tableA.date and tableB.b=tableA.w
order by a.date asc

但我的结果是:

Row tableA.date a   tableB.date b    
1   2.0040329E7 1.0 2.0040329E7 1.0  
2   2.0040329E7 1.0 2.0040329E7 1.0  
3   2.0040329E7 1.0 2.0040329E7 1.0  
4   2.0040329E7 1.0 2.0040329E7 1.0

为什么我有重复的行?这不是内连接应该消除的吗?

1 个答案:

答案 0 :(得分:1)

我建议使用BigQuery Standard SQL,以减少混淆

  

我有将日期映射到最大值的结果。然后,我想加入那些独特的日期并重视那些日期的其他标准

在下面尝试BigQuery Standard SQL

#standardSQL
SELECT entry.* 
FROM (
  SELECT ARRAY_AGG(row ORDER BY a DESC LIMIT 1)[OFFSET(0)] AS entry
  FROM `yourProject.yourDataset.yourTable` row 
  GROUP BY date
)
-- ORDER BY date

您可以使用虚拟数据进行测试,如下所示

#standardSQL
WITH T AS (
  SELECT 2.0051012E7 AS date, 5.0 AS a, 5.0 AS b, 1.0 AS c, 0.9 AS d, 21.0 AS e, 2.0170716E7 AS f, 0.8 AS g UNION ALL
  SELECT 2.0131101E7, 1.0, 5.0, 0.0, 1.0, 21.0, 2.0170716E7, 0.6 UNION ALL  
  SELECT 2.0060908E7, 3.0, 5.0, 0.0, 1.0, 21.0, 2.0170716E7, 0.7 
)
SELECT entry.* 
FROM (
  SELECT ARRAY_AGG(row ORDER BY a DESC LIMIT 1)[OFFSET(0)] AS entry
  FROM `T` row 
  GROUP BY date
)
ORDER BY date