猪:每袋元组的最大价值

时间:2016-02-08 07:32:53

标签: apache-pig

所以我有一些如下数据:

grunt> describe aliveevents_patient_id;
aliveevents_patient_id: {group: int,aliveevents: {(events::patientid: int,events::eventid: chararray,events::etimestamp: datetime,events::value: float,mortality::patientid: int,mortality::mtimestamp: datetime,mortality::label: int)}}

我如何能够获得每组etimestamp的最大价值?

基本上我想对以下内容这样做:

patient_id, etimestamp
1, 10
1, 20
2, 30

输出

patient_id, etimestamp
1, 20
2, 30

1 个答案:

答案 0 :(得分:0)

根据你的问题:

让aliveevents_patient_id包含两个字段{patient_id,etimestamp}

然后脚本是:

A = GROUP aliveevents_patient_id BY patient_id;
DUMP A;
(1,{(1,10),(1,20)})
(2,{(2,30)})

B = FOREACH A GENERATE group,MAX(aliveevents_patient_id.etimestamp);
DUMP B;

(1,20)
(2,30)