Bigquery查询以查找表的列名

时间:2012-07-05 06:11:51

标签: google-bigquery

我需要一个查询来查找Bigquery中表的列名,就像SQL中的以下查询一样:

SELECT column_name,data_type,data_length,data_precision,nullable FROM all_tab_cols where table_name ='EMP';

6 个答案:

答案 0 :(得分:2)

目前无法通过查询检索表元数据(即列名和类型),但这不是第一次请求它。

您是否有理由将此作为查询?表元数据可通过tables API获得。

答案 1 :(得分:1)

实际上可以使用SQL来实现。为此,您需要在日志记录表中查询正在创建的特定表的最后一个日志。

例如,假设每天加载/创建表:

    CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
          RETURNS ARRAY<STRING> AS ((
            SELECT
              SPLIT(
                REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
              ,',')
          ));
    WITH valid_schema_columns AS (
      WITH array_output aS (SELECT
        jsonSchemaStringToArray(jsonSchema) AS column_names
      FROM (
        SELECT
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
          , ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
        FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
        WHERE
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
          AND
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
          AND
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
      ) AS t
      WHERE
        t.record_count = 1 -- grab the latest entry
      )
      -- this is actually what UNNESTS the array into standard rows
      SELECT
        valid_column_name
      FROM array_output
      LEFT JOIN UNNEST(column_names) AS valid_column_name

    )

答案 2 :(得分:1)

是的,您可以使用INFORMATION_SCHEMA获取表元数据。

过去链接中提到的示例之一是从INFO_SCHEMA.COLUMN_FIELD_PATHS视图中获取github_repos数据集中的commits表的元数据,

  1. 在GCP控制台中打开BigQuery网络用户界面。

  2. 在“查询编辑器”框中输入以下标准SQL查询。 INFORMATION_SCHEMA需要标准的SQL语法。标准SQL是GCP控制台中的默认语法。

    SELECT
     *
    FROM
     `bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
    WHERE
     table_name="commits"
     AND column_name="author"
     OR column_name="difference"
    
  

注意:INFORMATION_SCHEMA视图名称区分大小写。

  1. 点击运行。

结果应如下所示

  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | table_name | column_name |     field_path      |                                                                      data_type                                                                      | description |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | commits    | author      | author              | STRUCT<name STRING, email STRING, time_sec INT64, tz_offset INT64, date TIMESTAMP>                                                                  | NULL        |
  | commits    | author      | author.name         | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.email        | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.time_sec     | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.tz_offset    | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.date         | TIMESTAMP                                                                                                                                           | NULL        |
  | commits    | difference  | difference          | ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, new_path STRING, old_sha1 STRING, new_sha1 STRING, old_repo STRING, new_repo STRING>> | NULL        |
  | commits    | difference  | difference.old_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.new_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.old_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_repo | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_repo | STRING                                                                                                                                              | NULL        |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+

答案 3 :(得分:0)

BigQuery现在支持信息模式:

SELECT column_name
FROM `bigquery-public-data`.irs_990.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'irs_990_2015'

答案 4 :(得分:0)

对于像我这样的新手,上面的语法如下:

select * from project_name.dataset_name.INFORMATION_SCHEMA.COLUMNS where table_catalog=project_name and table_schema=dataset_name and table_name=table_name

答案 5 :(得分:0)

要检查列,您可以通过CLI访问表

bq query --use_legacy_sql=false 'select Hour, sum(column 1) as column from `project_id.dataset.table_name` where Date(Hour) = '2020-06-10';'