如何将数组转换为字符串值

时间:2019-02-06 13:21:06

标签: google-bigquery

你好,我试图获取查询日志成本,我得到了总额,但是当我尝试按数据集细分时,我得到了这个错误:

'无法访问'

上类型为ARRAY>的值的字段datasetId

这是我要运行的查询:

WITH
  data AS (
  SELECT
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
    (
    SELECT
      ARRAY_TO_STRING((
        SELECT
          ARRAY_AGG(datasetId)
        FROM
          UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId) ))) AS datasetIds
  FROM
    `kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206` )
SELECT
  datasetIds,
  FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
  data
WHERE
  jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY
  datasetIds
ORDER BY
  Estimated_USD_Cost DESC

我正在使用标准SQL方言
我该如何投射此字段:

protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId

从数组到字符串? 我想念什么? 谢谢。

2 个答案:

答案 0 :(得分:1)

您需要UNNEST外层数组才能在内部选择数据集ID:

SELECT
  ARRAY_TO_STRING((
    SELECT ARRAY_AGG(datasetId)
    FROM UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables)
    ), ',') AS datasetIds
FROM ...

答案 1 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL   
WITH data AS (
  SELECT 
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
    ref.datasetId AS datasetId
  FROM `kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206`,
  UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables) ref 
)
SELECT
  datasetId,
  FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM data
WHERE jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY datasetId
ORDER BY Estimated_USD_Cost DESC   

如您所见,很明显,您需要UNNEST referencedTables ARRAY,但是您还需要确保对Cost的最终计算尽可能接近正确的值。同一查询可以引用同一数据集中的多个表,因此最好在CTE中使用DISTINCT。而且,相同的查询可以引用来自多个数据集的表-因此,在相同的计费字节中,属性将归因于多个数据集,因此您将被高估!我不知道您的确切意图-但您可能需要引入一些逻辑来在参考数据集中分配成本。