如何将数据从HDFS加载到hive表

时间:2017-03-11 19:32:30

标签: hive hdfs hql tweets

请告诉我如何将数据从HDFS加载到hive表?因为我丢失了昨天下载的推文。 加载我用过的数据。

LOAD DATA LOCAL INPATH '/user/hue/twitter/tweets/2017/03/10' 
OVERWRITE INTO TABLE tweets 
PARTITION (datehour=20170310). 

给我一​​个正确的查询这是我的表。我将它作为两个步骤发送

CREATE EXTERNAL TABLE twitter.tweets ( 
id BIGINT, 
created_at STRING, 
source STRING, 
favorited BOOLEAN, 
retweeted_status STRUCT< 
                 text:STRING,   
                 user:STRUCT < screen_name:STRING, name:STRING >, 
                 retweet_count:INT
                 >, 
entities STRUCT< 
                 urls:ARRAY<STRUCT<expanded_url:STRING>>, 
                 user_mentions:ARRAY<STRUCT <
                                             screen_name:STRING,    
                                             name:STRING
                                            >
                                     >,
                 hashtags:ARRAY<STRUCT<text:STRING>>
               >, 
text STRING, 
–user STRUCT< 
      screen_name:STRING, 
      name:STRING,  
      friends_count:INT, 
      followers_count:INT, 
      statuses_count:INT, 
      verified:BOOLEAN, 
      utc_offset:INT, 
      time_zone:STRING
      >, 
in_reply_to_screen_name STRING )
PARTITIONED BY (datehour INT) 
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' 
LOCATION '/twitter';

示例数据:只是附加

{"filter_level":"low","retweeted":false,"in_reply_to_screen_‌​name":null,"possibly‌​_sensitive":false,"t‌​runcated":false,"lan‌​g":"en","in_reply_to‌​_status_id_str":null‌​,"id":84064934204214‌​8865,"extended_entit‌​ies":{"media":[{"siz‌​es":{"thumb":{"w":15‌​0,"resize":"crop","h‌​":150},"small":{"w":‌​340,"resize":"fit","‌​h":340},"medium":{"w‌​":600,"resize":"fit"‌​,"h":600},"large":{"‌​w":960,"resize":"fit‌​","h":960}},"source_‌​user_id":15934076,

我发现将数据从HDFS加载到hive表是

LOAD DATA INPATH '/user/hue/twitter/tweets/2017/03/10' OVERWRITE INTO TABLE tweets PARTITION (datehour=20170310). 

这是正确的,我会丢失我的源文件吗?如果是这样,解决方案查询是什么?

0 个答案:

没有答案