Four Table加入BigQuery

时间:2016-05-03 14:37:04

标签: google-bigquery

好的,所以我正在尝试将四个不同的表链接在一起,并且变得非常困难。我提供了每张桌子的片段,希望你们都能帮忙

表1:数据

+--------+--------+-----------+
| charge | amount |   date    |
+--------+--------+-----------+
|    123 |  10000 | 2/10/2016 |
|    456 |  10000 | 1/28/2016 |
|    789 |  10000 | 3/30/2016 |
+--------+--------+-----------+

表2:data_metadata

 +--------+------------+------------+
    | charge |    key     |   value    |
    +--------+------------+------------+
    |    123 | identifier | trrkfll212 |
    |    456 | code       | test       |
    |    789 | ID         | 123xyz     |
    +--------+------------+------------+

表3:买方

  +-----+-----------+----------+----------+
| id  |   date    | discount |   plan   |
+-----+-----------+----------+----------+
| ABC | 2/13/2016 | yes      | option a |
| DEF | 2/1/2016  | yes      | option a |
| GHI | 1/22/2016 | no       | option a |
+-----+-----------+----------+----------+

表4:buyer_metadata

+--------------+-----------+--------+
| id |    |key|              | value  |
+--------------+-----------+--------+
| ABC          | migration | TRUE   |
| DEF          | emid      | foo    |
| GHI          | ID        | 123xyz |
+--------------+-----------+--------+

好的,因此表格数据和data_metadata显然是通过收费栏连接的。

表买家和buyer_metadata通过id列连接。

但我想将所有这些联系在一起。我很确定实现这一目标的方法是通过“值”列中的公共字段将元数据表链接在一起(在此示例中为:123xyz)。

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

如果全部"链接"这可能看起来像那样。列是唯一的:

import boto3
inptstr = 'localdestination'
with open(inptstr,'w') as newfile:
    newfile.write('ABCDEFG')

fnamebuck = 'bucketdestination'
s3 = boto3.client('s3')
s3.upload_file(inptstr, 'bucketname', fnamebuck)

如果没有,我认为您必须使用SELECT * FROM data d JOIN data_metadata dm ON d.charge = dm.charge JOIN buyer_metada bm ON dm.value = bm.value JOIN buyer b ON bm.id = b.id 子句

之类的内容

答案 1 :(得分:1)

让我们分两步,首先为databuyer创建复合表。数据的复合表:

SELECT data.charge, data.amount, data.date,
       data_metadata.key, data_metadata.value 
FROM [data] AS data  
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge

买家的复合表:

SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
       buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer  
JOIN (SELECT key,  value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id

然后让我们加入两个复合表

SELECT composite_data.*, composite_buyer.*
FROM (
    SELECT data.charge, data.amount, data.date,
           data_metadata.key, data_metadata.value 
    FROM [data] AS data  
    JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
    ON data.charge = data_metadata.charge) AS composite_data
JOIN (
    SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
           buyer_metadata.key, buyer_metadata.value
    FROM [buyer] AS buyer  
    JOIN (SELECT key,  value FROM [buyer_metadata]) AS buyer_metadata
    ON buyer.id = buyer_metadata.id) AS composite_buyer

ON composite_data.value = composite_buyer.value

我没有测试过,但它可能很接近。

供参考,这是BigQuery JOINs上的页面。你见过this SO吗?