PIG中的SPLIT运算符

时间:2019-11-30 09:53:55

标签: apache-pig

我正尝试将reviews_per_month与https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data分成2个组

    NY_Airbnb_data = LOAD 'AB_NYC_2019.csv' using PigStorage (',') as (id:int, name:chararray, host_id:int, host_name:chararray, neighbourhood_group:chararray, neighbourhood:chararray, lattitude:double, longitude:double, room_type:chararray, price:int, minimum_night:int, number_of_review:int, last_review:datetime, reviews_per_month:double, calculated_host_listing_count:int, availability_365:int);

    b0 = FOREACH NY_Airbnb_data GENERATE name, neighbourhood_group, neighbourhood, room_type, reviews_per_month; 
    b1 = SPLIT b0 into b2 if reviews_per_month<1, b3 if (reviews_per_month>1.5);
    dump b2;

这是我得到的错误 grunt> b1 =如果reviews_per_month <1,则将b0拆分为b2;如果(b0.reviews_per_month> 1.5),则将b3拆分为b2; 2019-11-30 01:48:12,232 [main]错误org.apache.pig.tools.grunt.Grunt-错误1200:语法错误,'b1'或附近的意外符号

1 个答案:

答案 0 :(得分:0)

SPLIT的语法很简单:

SPLIT b0 into b2 if reviews_per_month<1, b3 if (reviews_per_month>1.5);

开头没有b1 =。如果要将reviews_per_month> = 1和<= 1.5的记录放入b1,则必须指定默认关系:

SPLIT b0 INTO b2 IF reviews_per_month < 1, b3 IF reviews_per_month > 1.5, b1 OTHERWISE;
相关问题