Question

我正尝试将reviews_per_month与https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data分成2个组

    NY_Airbnb_data = LOAD 'AB_NYC_2019.csv' using PigStorage (',') as (id:int, name:chararray, host_id:int, host_name:chararray, neighbourhood_group:chararray, neighbourhood:chararray, lattitude:double, longitude:double, room_type:chararray, price:int, minimum_night:int, number_of_review:int, last_review:datetime, reviews_per_month:double, calculated_host_listing_count:int, availability_365:int);

    b0 = FOREACH NY_Airbnb_data GENERATE name, neighbourhood_group, neighbourhood, room_type, reviews_per_month; 
    b1 = SPLIT b0 into b2 if reviews_per_month<1, b3 if (reviews_per_month>1.5);
    dump b2;

这是我得到的错误 grunt> b1 =如果reviews_per_month <1，则将b0拆分为b2；如果（b0.reviews_per_month> 1.5），则将b3拆分为b2； 2019-11-30 01：48：12,232 [main]错误org.apache.pig.tools.grunt.Grunt-错误1200：语法错误，'b1'或附近的意外符号

Answer 1

SPLIT的语法很简单：

SPLIT b0 into b2 if reviews_per_month<1, b3 if (reviews_per_month>1.5);

开头没有b1 =。如果要将reviews_per_month> = 1和<= 1.5的记录放入b1，则必须指定默认关系：

SPLIT b0 INTO b2 IF reviews_per_month < 1, b3 IF reviews_per_month > 1.5, b1 OTHERWISE;

PIG中的SPLIT运算符

1 个答案: