所以我有一些如下数据:
grunt> describe aliveevents_patient_id;
aliveevents_patient_id: {group: int,aliveevents: {(events::patientid: int,events::eventid: chararray,events::etimestamp: datetime,events::value: float,mortality::patientid: int,mortality::mtimestamp: datetime,mortality::label: int)}}
我如何能够获得每组etimestamp的最大价值?
基本上我想对以下内容这样做:
patient_id, etimestamp
1, 10
1, 20
2, 30
输出
patient_id, etimestamp
1, 20
2, 30
答案 0 :(得分:0)
根据你的问题:
让aliveevents_patient_id包含两个字段{patient_id,etimestamp}
然后脚本是:
A = GROUP aliveevents_patient_id BY patient_id;
DUMP A;
(1,{(1,10),(1,20)})
(2,{(2,30)})
B = FOREACH A GENERATE group,MAX(aliveevents_patient_id.etimestamp);
DUMP B;
(1,20)
(2,30)