Question

I have an input data looks something like this and want process this data using PIG script.

USER_ID   CLICK_NO  PAGE_NAME   CLICK_TIME
1         1         PAGE1       <time from epoch as long>
1         2         PAGE2       <time from epoch as long>
1         3         PAGE3       <time from epoch as long>

Here, I am getting user id and time when he/she clicked on each link on a website. I wanted to find total time he/she spent on the website. In short, I wanted to group by user id, and sort by CLICK_NO which is easy, but then I do not know if I can access next row and find different between two clicks. If I can do that, then I can find sum of all difference in time to find total time spent on the site. Can someone help?

I can post code snippet but it is pretty straight forward to group by USER_ID and order by CLICK_NO.

Answer 1

按MAX(click_time) - MIN(click_time)分组后，差异总和等于user_id。猪有这方面的功能。

https://pig.apache.org/docs/r0.15.0/func.html#max https://pig.apache.org/docs/r0.15.0/func.html#min

Find difference between two rows in pig script

1 个答案: