我正尝试将reviews_per_month与https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data分成2个组
NY_Airbnb_data = LOAD 'AB_NYC_2019.csv' using PigStorage (',') as (id:int, name:chararray, host_id:int, host_name:chararray, neighbourhood_group:chararray, neighbourhood:chararray, lattitude:double, longitude:double, room_type:chararray, price:int, minimum_night:int, number_of_review:int, last_review:datetime, reviews_per_month:double, calculated_host_listing_count:int, availability_365:int);
b0 = FOREACH NY_Airbnb_data GENERATE name, neighbourhood_group, neighbourhood, room_type, reviews_per_month;
b1 = SPLIT b0 into b2 if reviews_per_month<1, b3 if (reviews_per_month>1.5);
dump b2;
这是我得到的错误 grunt> b1 =如果reviews_per_month <1,则将b0拆分为b2;如果(b0.reviews_per_month> 1.5),则将b3拆分为b2; 2019-11-30 01:48:12,232 [main]错误org.apache.pig.tools.grunt.Grunt-错误1200:语法错误,'b1'或附近的意外符号
答案 0 :(得分:0)
SPLIT
的语法很简单:
SPLIT b0 into b2 if reviews_per_month<1, b3 if (reviews_per_month>1.5);
开头没有b1 =
。如果要将reviews_per_month
> = 1和<= 1.5的记录放入b1
,则必须指定默认关系:
SPLIT b0 INTO b2 IF reviews_per_month < 1, b3 IF reviews_per_month > 1.5, b1 OTHERWISE;