Apache Pig:计算日期和当前日期之间的天数

时间:2017-01-14 07:05:24

标签: hadoop apache-pig

我有一份表格中的电影列表(#,标题,年份,评级,持续时间):

1,The Nightmare Before Christmas,1993,3.9,4568
2,The Mummy,1932,3.5,4388
3,Orphans of the Storm,1921,3.2,9062
4,The Object of Beauty,1991,2.8,6150
5,Night Tide,1963,2.8,5126
6,One Magic Christmas,1985,3.8,5333
7,Muriel's Wedding,1994,3.5,6323
8,Mother's Boys,1994,3.4,5733
9,Nosferatu: Original Version,1929,3.5,5651
10,Nick of Time,1995,3.4,5333
...

我在每个元组中都有一年,我需要将其视为1st Jan of each year

我需要计算这个日期和当前日期之间的天数

我的方法:

movies = LOAD 'movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
daysbetween_data = foreach movies generate DaysBetween(ToDate(year,'<WHAT FORMAT TO GIVE HERE>'), ToDate(<CURRENT DATE HERE>));

知道怎么做吗?

1 个答案:

答案 0 :(得分:1)

将年份加载到chararray字段,使用CONCAT将01-01-追加到年份字段,以便获得格式&#39; MM-dd-yyyy&#39;然后使用ToDate和DaysBetween。

movies = LOAD 'movies_data.csv' USING PigStorage(',') as (id:int,name:chararray,year:chararray,rating:double,duration:int);
daysbetween_data = foreach movies generate DaysBetween(ToDate(CONCAT('01-01-',year),'MM-dd-yyyy'),CurrentTime());