如何使用REPEATED RECORDS连接表

时间:2017-04-25 15:19:40

标签: arrays join struct google-bigquery records

在ON子句中使用REPEATED RECORD字段时,我正在努力加入表。我得到的错误是:
 No matching signature for operator = for argument types: ARRAY<STRUCT<experiment INT64>>, INT64. Supported signature: ANY = ANY at [6:5]

我的重复记录被称为ab_test,其中有4个字段(实验,组,名称,状态)

我的查询:

SELECT be.type, be.group, be.user.id, be.uid, 
       ARRAY(SELECT STRUCT(ab_test.experiment as experiment , ab_test.group as group, ab_test.name as name, ab_test.state, uid_allocation_timestamp) FROM UNNEST(ab_test) AS ab_test) as ab_test
FROM fiverr-bigquery.dwh.bi_events be
JOIN staging_tables.ab_tests_uid_allocation_history uid_alloc
 ON be.uid = uid_alloc.uid 
AND ***ARRAY(SELECT STRUCT(ab_test.experiment) FROM UNNEST(ab_test) AS ab_test ) = uid_alloc.test_id***
WHERE be._PARTITIONTIME = '2017-04-24 00:00:00'
  AND DATE(created_at)  = DATE('2017-04-24')
  AND ARRAY(SELECT STRUCT(ab_test.experiment) FROM UNNEST(ab_test) AS ab_test ) IS NOT NULL
  AND type = 'order.success'

我也尝试用第二个替换第二个ON子句:

CAST((SELECT experiment FROM UNNEST(ab_test) as experiment ) AS INT64) = uid_alloc.test_id

但没有运气(我得到错误:Invalid cast from STRUCT<experiment INT64,群组INT64, name STRING, ...> to INT64 at [40:10]

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

  

我也尝试过替换......但没有运气......有什么想法吗?

下面是试图模仿你的用例 - 至少是那个导致你看到的错误的部分

如果你在下面运行(BigQuery Standard SQL) - 你将得到与你的情况完全相同的错误

  
#dtandardSQL
WITH data AS (
  SELECT 1 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (911, 2, 'a'), (2, 2, 'b'), (3, 2, 'c')] AS ab_test UNION ALL
  SELECT 2 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (11, 3, 'a'), (12, 3, 'b'), (13, 3, 'c')] AS ab_test UNION ALL
  SELECT 3 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (21, 4, 'a'), (911, 4, 'b'), (23, 4, 'c')] AS ab_test 
)
SELECT id 
FROM data
WHERE CAST((SELECT experiment FROM UNNEST(ab_test) AS experiment ) AS INT64) = 911

错误将是

Error: Invalid cast from STRUCT<experiment INT64, grp INT64, name STRING> to INT64 at [12:12]  

要解决此问题 - 请使用以下方法

#dtandardSQL
WITH data AS (
  SELECT 1 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (911, 2, 'a'), (2, 2, 'b'), (3, 2, 'c')] AS ab_test UNION ALL
  SELECT 2 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (11, 3, 'a'), (12, 3, 'b'), (13, 3, 'c')] AS ab_test UNION ALL
  SELECT 3 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (21, 4, 'a'), (911, 4, 'b'), (23, 4, 'c')] AS ab_test 
)
SELECT id 
FROM data
WHERE (SELECT COUNT(1) 
       FROM UNNEST(ab_test) AS ab_test 
       WHERE ab_test.experiment = 911
     ) > 0  

现在没有错误,输出将是

id   
1    
3    

因为这些行包含ab_test的元素,其中包含experiment = 911

最后,下面是您在问题中使用JOIN表中测试值的示例

#dtandardSQL
WITH data AS (
  SELECT 1 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (911, 2, 'a'), (2, 2, 'b'), (3, 2, 'c')] AS ab_test UNION ALL
  SELECT 2 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (11, 3, 'a'), (12, 3, 'b'), (13, 3, 'c')] AS ab_test UNION ALL
  SELECT 3 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
    (21, 4, 'a'), (911, 4, 'b'), (23, 4, 'c')] AS ab_test 
), 
tests AS (
  SELECT 911 AS test_id UNION ALL
  SELECT 912 AS test_id 
)
SELECT data.id 
FROM data
CROSS JOIN tests
WHERE (SELECT COUNT(1) 
       FROM UNNEST(ab_test) AS ab_test 
       WHERE ab_test.experiment = tests.test_id
     ) > 0

希望您能在上面适用于您的具体案例

相关问题