猪:过滤行条件

时间:2016-11-15 15:56:02

标签: apache-pig

我想要一个猪脚本来过滤不同条件下的行:

i2 = GROUP i1 ALL;

i3 = FOREACH i2 GENERATE AVG(i1.user_followers_count) AS avg_user_followers_count,  AVG(i1.avl_user_total_retweets) AS avg_avl_user_total_retweets, AVG(i1.avl_user_total_likes) AS avg_avl_user_total_likes, AVG(i1.avl_user_total_replies) AS avg_avl_user_total_replies, AVG(i1.avl_user_engagements) AS avg_avl_user_engagements;

top = FILTER i1 BY (user_followers_count > i3.avg_user_followers_count) AND (avl_user_engagements > i3.avg_avl_user_engagements) AND (avl_user_total_retweets > i3.avg_avl_user_total_retweets) AND (avl_user_total_likes > i3.avg_avl_user_total_likes) AND (avl_user_total_replies > i3.avl_user_total_replies);

bot = FILTER i1 BY (user_followers_count < i3.avg_user_followers_count) AND (avl_user_engagements < i3.avg_avl_user_engagements) AND (avl_user_total_retweets < i3.avg_avl_user_total_retweets) AND (avl_user_total_likes < i3.avg_avl_user_total_likes) AND (avl_user_total_replies < i3.avl_user_total_replies);

这就是我在每个方面选择高于平均值的所有内容top,每个方面的平均值都低于bottom

现在,我希望在将topbot过滤到另一个名为med的别名后,获取剩余的行(即所有内容混合,即高于平均值且低于平均值的几个方面)。我怎么做?

1 个答案:

答案 0 :(得分:0)

使用SPLIT

SPLIT i1 INTO 
   top IF((user_followers_count > i3.avg_user_followers_count) AND (avl_user_engagements > i3.avg_avl_user_engagements) AND (avl_user_total_retweets > i3.avg_avl_user_total_retweets) AND (avl_user_total_likes > i3.avg_avl_user_total_likes) AND (avl_user_total_replies > i3.avl_user_total_replies)),
   bot IF((user_followers_count < i3.avg_user_followers_count) AND (avl_user_engagements < i3.avg_avl_user_engagements) AND (avl_user_total_retweets < i3.avg_avl_user_total_retweets) AND (avl_user_total_likes < i3.avg_avl_user_total_likes) AND (avl_user_total_replies < i3.avl_user_total_replies)),
   med OTHERWISE;
相关问题