BigQuery和标准SQL:数组中每个distnict字段的聚合

时间:2019-02-18 22:21:42

标签: google-bigquery standard-sql

我正在尝试计算所有五分钟期间的总和,每个区域IP的流量总和(以gbps为单位)。

我是BigQuery的新手,因此通过查看其他示例,我尝试了以下操作:

deactivate(_:)

我希望输出为:

WITH `project.dataset.test` AS (
  SELECT '01/01/2019 12:30' time, '192.168.10.1' ip_address, 10 network, 1 gbps UNION ALL
  SELECT '01/01/2019 12:30', '192.168.10.2', 11, 2 UNION ALL
  SELECT '01/01/2019 12:30', '192.168.10.3', 12, 3 UNION ALL
  SELECT '01/01/2019 12:35', '192.168.10.1', 10, 1 UNION ALL
  SELECT '01/01/2019 12:35', '192.168.10.2', 11, 2 UNION ALL
  SELECT '01/01/2019 12:35', '192.168.10.3', 12, 3 UNION ALL
  SELECT '01/01/2019 12:40', '192.168.10.1', 10, 1 UNION ALL
  SELECT '01/01/2019 12:40', '192.168.10.2', 11, 2 UNION ALL
  SELECT '01/01/2019 12:40', '192.168.10.3', 12, 3
  ),
ip AS (
  SELECT DISTINCT (ip_address) ip_address
  FROM `project.dataset.test`
),
qualified AS (
SELECT ip_address, network, ARRAY_AGG (gbps ORDER BY ip_address DESC LIMIT 1)[SAFE_OFFSET(0)] gbps
FROM `project.dataset.test`
GROUP BY ip_address, network
)
SELECT ip_address, network, SUM(gbps)gbps
FROM (
SELECT d.ip_address ip_address, network, ARRAY_AGG (gbps ORDER BY q.ip_address DESC LIMIT 1)[SAFE_OFFSET(0)] gbps
FROM ip d
JOIN qualified q
ON q.ip_address = d.ip_address
GROUP BY ip_address, network
)
group BY ip_address, network
ORDER BY gbps DESC

实际的输出是:

Row     ip_address      network   gbps  
1       192.168.10.3    12        9
2       192.168.10.2    11        6
3       192.168.10.1    10        3

我在做什么错?不管5分钟间隔和/或网络数量如何,如何选择不同IP的总和?费伊,我要整理成千上万的行,这只是我正在处理的示例。

1 个答案:

答案 0 :(得分:0)

  

无论5分钟和/或5分钟的网络数量如何,如何选择不同IP的总和?

以下示例适用于BigQuery Standatd SQL

#standardSQL
WITH `project.dataset.test` AS (
  SELECT '01/01/2019 12:30' time, '192.168.10.1' ip_address, 10 network, 1 gbps UNION ALL
  SELECT '01/01/2019 12:30', '192.168.10.2', 11, 2 UNION ALL
  SELECT '01/01/2019 12:30', '192.168.10.3', 12, 3 UNION ALL
  SELECT '01/01/2019 12:35', '192.168.10.1', 10, 1 UNION ALL
  SELECT '01/01/2019 12:35', '192.168.10.2', 11, 2 UNION ALL
  SELECT '01/01/2019 12:35', '192.168.10.3', 12, 3 UNION ALL
  SELECT '01/01/2019 12:40', '192.168.10.1', 10, 1 UNION ALL
  SELECT '01/01/2019 12:40', '192.168.10.2', 11, 2 UNION ALL
  SELECT '01/01/2019 12:40', '192.168.10.3', 12, 3
)
SELECT ip_address, network, SUM(gbps) gbps
FROM `project.dataset.test`
GROUP BY ip_address, network

有结果

Row ip_address      network gbps     
1   192.168.10.3    12      9    
2   192.168.10.2    11      6    
3   192.168.10.1    10      3