比较Hive表列与另一个表字段中的值列表?

时间:2019-03-19 16:23:52

标签: sql hadoop hive hiveql

这听起来可能令人困惑,但是我需要将配置单元列表中存储的表名称和列的列表与实际的表列进行比较,并产生比较结果。我无法查询配置单元metastore。似乎将“ db.tbl中的显示列”导出到文件中并进行比较可以比较查找表,但是我正在寻找是否可以仅使用配置单元查询来实现结果。有什么想法吗?

示例配置单元架构查找表:

+----------+-----------+-----------------------------+------------------+
| tbl_name | col_name  | type                        | desc             |
+----------+-----------+-----------------------------+------------------+
| mario    | issue_id  | timestamp without time zone | blank            |
+----------+-----------+-----------------------------+------------------+
| mario    | create_id | bigint                      | dob              |
+----------+-----------+-----------------------------+------------------+
| mario    | status    | bigint                      | some info        |
+----------+-----------+-----------------------------+------------------+
| mario    | location  | bigint                      | some other info  |
+----------+-----------+-----------------------------+------------------+
| luigi    | issue_id  | character varying(65535)    | some more info   |
+----------+-----------+-----------------------------+------------------+
| luigi    | cust_id   | bigint                      | enough info here |
+----------+-----------+-----------------------------+------------------+
| yoshi    | status    | int                         | blank            |
+----------+-----------+-----------------------------+------------------+
| yoshi    | property  | int                         | blank            |
+----------+-----------+-----------------------------+------------------+

样本表及其列名-

mario:              luigi:              yoshi:
issue_id            issue_id            status
create_id           cust_id 
status      
health      
quality 

我正在寻找一个输出,该输出将为查找表中的每个col_name提供一个存在/不存在标志。 (请注意,实际表中可能包含查询中可能不存在的列,但我们可以忽略它)

因此,预期输出如下:

+----------+-----------+-----------------------------+------------------+--------------+
| tbl_name | col_name  | type                        | desc             | presence     |
+----------+-----------+-----------------------------+------------------+--------------+
| mario    | issue_id  | timestamp without time zone | blank            | exists       |
+----------+-----------+-----------------------------+------------------+--------------+
| mario    | create_id | bigint                      | dob              | exists       |
+----------+-----------+-----------------------------+------------------+--------------+
| mario    | status    | bigint                      | some info        | exists       |
+----------+-----------+-----------------------------+------------------+--------------+
| mario    | location  | bigint                      | some other info  | do not exist |
+----------+-----------+-----------------------------+------------------+--------------+
| luigi    | issue_id  | character varying(65535)    | some more info   | exists       |
+----------+-----------+-----------------------------+------------------+--------------+
| luigi    | cust_id   | bigint                      | enough info here | exists       |
+----------+-----------+-----------------------------+------------------+--------------+
| yoshi    | status    | int                         | blank            | exists       |
+----------+-----------+-----------------------------+------------------+--------------+
| yoshi    | property  | int                         | blank            | do not exist |
+----------+-----------+-----------------------------+------------------+--------------+    

更新:我最终走了很长一段路,使用循环将现有表的列列表提取到文件,然后加载到外部表。 (如果有人尝试实现相同目标)

0 个答案:

没有答案