蜂巢内部蜂巢表和外部蜂巢表之间的区别

时间:2016-01-19 19:07:42

标签: hadoop hive

从这个link,我了解到与存储和删除相关的内部和外部表之间存在一些差异。任何人都可以 告诉我查询效率有什么不同吗?

3 个答案:

答案 0 :(得分:2)

没有内部表格。 Hive有托管表和外部表。两者之间没有性能差异。

与DDL角度的两个不同之处:

  1. 如果是托管表,Hive将控制文件。如果删除表,则HDFS中的数据和Metastore DB中的元数据将消失。对于外部表,只有元数据表中的元数据将消失。
  2. 语法差异。外部表必须指定位置。
  3. 从查询的角度来看,没有任何区别。

答案 1 :(得分:1)

回答你问题:

对于外部表,Hive不会将数据移动到其仓库目录中。如果删除外部表,则删除表元数据,但不删除数据。

对于内部表,Hive将数据移动到其仓库目录中。如果删除该表,则将删除表元数据和数据。 供您参考,

内部与外部的区别外部表:

For External Tables -

    External table stores files on the HDFS server but tables are not linked to the source file completely.

    If you delete an external table the file still remains on the HDFS server.

    As an example if you create an external table called “table_test” in HIVE using HIVE-QL and link the table to file “file”, then deleting “table_test” from HIVE will not delete “file” from HDFS.

    External table files are accessible to anyone who has access to HDFS file structure and therefore security needs to be managed at the HDFS file/folder level.

    Meta data is maintained on master node, and deleting an external table from HIVE only deletes the metadata not the data/file.

For Internal Tables-

    Stored in a directory based on settings in hive.metastore.warehouse.dir, by default internal tables are stored in the following directory “/user/hive/warehouse” you can change it by updating the location in the config file .
    Deleting the table deletes the metadata and data from master-node and HDFS respectively.
    Internal table file security is controlled solely via HIVE. Security needs to be managed within HIVE, probably at the schema level (depends on organization).

Hive可能有内部或外部表,这是影响数据加载,控制和管理方式的选择。

在以下情况下使用EXTERNAL表:

The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files.
Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schema (tables or views) at a single data set or if you are iterating through various possible schema.
Hive should not own data and control settings, directories, etc., you may have another program or process that will do those things.
You are not creating table based on existing table (AS SELECT).

在以下情况下使用INTERNAL表:

The data is temporary.
You want Hive to completely manage the life-cycle of the table and data.

来源:

HDInsight:Hive内部和外部表格简介

内部& Hadoop-HIVE中的外部表格

答案 2 :(得分:0)

Hive在现有Hadoop集群之上为我们提供数据仓库设施。除此之外,它提供了类似SQL的界面。

您可以通过两种不同的方式创建表格。

  1. 创建外部表
  2.   

    创建EXTERNAL TABLE学生(id INT,名称STRING,批处理STRING)行   格式删除字段由' \ t' #supply delimiter LOCATION   ' /用户/ HDFS /学生&#39 ;;

    对于外部表,Hive不会将数据移动到其仓库目录中。如果删除外部表,则删除表元数据,但不删除数据。

    1. 创建普通表
    2.   

      CREATE TABLE学生(id INT,名称STRING,批处理STRING)行格式   被删除的字段由' \ t' #supply delimiter LOCATION   ' /用户/ hddfs /学生&#39 ;;

      对于普通表,配置单元将数据移动到其仓库目录中。如果删除该表,则表元数据和数据将被删除。

      您可以查看this