MY DATASET
我的数据集包括在一周的不同日期(ID
)在不同区域(Location
)工作的许多人(Day
)的开始和结束时间。我的数据集的一个例子如下:
> head(WeekOne, 15)
Start Finish Day ID Location
1 2017-04-12 00:00:00 2017-04-12 00:02:55 D1 Daniel Office
2 2017-04-12 00:02:55 2017-04-12 00:06:18 D1 Daniel Office
3 2017-04-12 00:06:18 2017-04-12 00:08:20 D1 Daniel OnSite
4 2017-04-12 00:08:20 2017-04-12 00:08:40 D1 Daniel OnSite
5 2017-04-12 00:08:40 2017-04-12 00:10:11 D1 Daniel Travel
6 2017-04-12 00:10:11 2017-04-12 00:10:18 D1 Daniel Travel
7 2017-04-12 00:10:18 2017-04-12 00:17:52 D1 Daniel Travel
8 2017-04-12 00:17:52 2017-04-12 00:19:00 D1 Daniel Travel
9 2017-04-12 00:19:00 2017-04-12 00:19:56 D1 Daniel OnSite
10 2017-04-12 00:19:56 2017-04-12 00:28:48 D1 Daniel OnSite
11 2017-04-12 00:00:00 2017-04-12 00:03:52 D2 Daniel OnSite
12 2017-04-12 00:03:52 2017-04-12 00:04:05 D2 Daniel Office
13 2017-04-12 00:04:05 2017-04-12 00:08:32 D2 Daniel Office
14 2017-04-12 00:08:32 2017-04-12 00:16:01 D2 Daniel Travel
15 2017-04-12 00:16:01 2017-04-12 00:25:35 D2 Daniel OnSite
我希望知道每周ID
每Location
次花费的总时间(以分钟为单位)。 Day
的最高级别是D7,我每周都有一个单独的data.frame。因此,我只需要遍历Location
和ID
。
我所尝试的内容
下面的代码,虽然这会以奇怪的格式返回分钟,并且不会考虑在一天内多次访问同一位置。例如,Daniel在OnSite
上两次访问D1
。
WeekOne %>%
group_by(ID, Location) %>%
summarise(Duration = max(Finish) - min(Start))
我确实考虑过创建一个新列WeekOne$Level
来计算Location
中的多个和更改。然后我可以迭代每个级别并使用上面的代码。例如:
> head(WeekOne, 15)
Start Finish Day ID Location Level
1 2017-04-12 00:00:00 2017-04-12 00:02:55 D1 Daniel Office 1
2 2017-04-12 00:02:55 2017-04-12 00:06:18 D1 Daniel Office 1
3 2017-04-12 00:06:18 2017-04-12 00:08:20 D1 Daniel OnSite 2
4 2017-04-12 00:08:20 2017-04-12 00:08:40 D1 Daniel OnSite 2
5 2017-04-12 00:08:40 2017-04-12 00:10:11 D1 Daniel Travel 3
6 2017-04-12 00:10:11 2017-04-12 00:10:18 D1 Daniel Travel 3
7 2017-04-12 00:10:18 2017-04-12 00:17:52 D1 Daniel Travel 3
8 2017-04-12 00:17:52 2017-04-12 00:19:00 D1 Daniel Travel 3
9 2017-04-12 00:19:00 2017-04-12 00:19:56 D1 Daniel OnSite 4
10 2017-04-12 00:19:56 2017-04-12 00:28:48 D1 Daniel OnSite 4
11 2017-04-12 00:00:00 2017-04-12 00:03:52 D2 Daniel OnSite 5
12 2017-04-12 00:03:52 2017-04-12 00:04:05 D2 Daniel Office 6
13 2017-04-12 00:04:05 2017-04-12 00:08:32 D2 Daniel Office 6
14 2017-04-12 00:08:32 2017-04-12 00:16:01 D2 Daniel Travel 7
15 2017-04-12 00:16:01 2017-04-12 00:25:35 D2 Daniel OnSite 8
WeekOne %>%
group_by(ID, Level) %>%
summarise(Duration = max(Finish) - min(Start))
但是,我不确定如何添加此列,它不考虑Location
,似乎很麻烦,并且无法解决以有趣格式返回分钟的问题。
我的问题
如何随着时间的推移快速轻松地计算每个Location
ID
的总持续时间?我希望持续时间以分钟为单位,四舍五入到最接近的分钟。例如:3分钟。
答案 0 :(得分:1)
您希望先计算持续时间,然后按ID和位置获取总和:
WeekOne %>%
mutate(Duration = Finish - Start) %>%
group_by(ID, Location) %>%
summarize(Total_Duration = round(sum(Duration) / 60, 1))