how to get the most recent dataset using a wildcard in google bigquery

时间:2017-07-10 15:29:30

标签: google-bigquery

if I have a series of fact tables such as:

fact-01012001
fact-01022001
fact-01032001
dim001
dim002

a wildcard will allow me to search all three, for example:

select * from fact-*

is there a way to use wildcards or otherwise to get the most recent fact table? say only 01032001?

2 个答案:

答案 0 :(得分:2)

在实施the relevant feature request之前,您需要使用查询来确定最近的日期,然后使用另一个查询来从该表中进行选择。例如:

#standardSQL
SELECT _TABLE_SUFFIX AS latest_date
FROM `fact-*`
ORDER BY PARSE_DATE('%m%d%Y', _TABLE_SUFFIX) DESC LIMIT 1;

检索最新日期后,查询它:

#standardSQL
SELECT *
FROM `fact-01032001`;

答案 1 :(得分:1)

以下是BigQuery Standard SQL的一步方法

  
#standardSQL
SELECT *
FROM `yourProject.yourDataset.fact_*`
WHERE _TABLE_SUFFIX IN (
  SELECT 
    FORMAT_DATE('%m%d%Y', MAX(PARSE_DATE('%m%d%Y', SUBSTR(table_id, - 8)))) AS d
  FROM `yourProject.yourDataset.__TABLES_SUMMARY__`
  WHERE SUBSTR(table_id, 1, LENGTH('fact_')) = 'fact_' 
  AND LENGTH(table_id) = LENGTH('fact_') + 8
  GROUP BY SUBSTR(table_id, 1, LENGTH(table_id) - 8)
)  

当然,您可以将LENGTH('fact_')替换为5 - 我只是这样说,以便更好地理解 8是预期后缀的长度,因此您只能从以下列表中捕获预期的表:

fact_01012001
fact_01022001
fact_01032001