请告诉我如何将数据从HDFS加载到hive表?因为我丢失了昨天下载的推文。 加载我用过的数据。
LOAD DATA LOCAL INPATH '/user/hue/twitter/tweets/2017/03/10'
OVERWRITE INTO TABLE tweets
PARTITION (datehour=20170310).
给我一个正确的查询这是我的表。我将它作为两个步骤发送
CREATE EXTERNAL TABLE twitter.tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT < screen_name:STRING, name:STRING >,
retweet_count:INT
>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT <
screen_name:STRING,
name:STRING
>
>,
hashtags:ARRAY<STRUCT<text:STRING>>
>,
text STRING,
–user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING
>,
in_reply_to_screen_name STRING )
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/twitter';
示例数据:只是附加
{"filter_level":"low","retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":840649342042148865,"extended_entities":{"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":340},"medium":{"w":600,"resize":"fit","h":600},"large":{"w":960,"resize":"fit","h":960}},"source_user_id":15934076,
我发现将数据从HDFS加载到hive表是
LOAD DATA INPATH '/user/hue/twitter/tweets/2017/03/10' OVERWRITE INTO TABLE tweets PARTITION (datehour=20170310).
这是正确的,我会丢失我的源文件吗?如果是这样,解决方案查询是什么?