Question

这是我的数据，我希望在事件后删除ID的所有数据

ID   Event  time
1      0     1
1      1     2
2      0     3
1      0     4
2      0     5

因为对于ID号1，事件大于0，我想删除ID 1的所有下一个数据。所以，我删除了第4行，我的理想输出将是

 ID   Event  time
  1     0     1
  1     1     2
  2     0     3
  2     0     5

我该怎么做？

 dput(df)
structure(list(ID = c(1L, 1L, 2L, 1L, 2L), Event = c(0L, 1L, 
0L, 0L, 0L), time = 1:5), .Names = c("ID", "Event", "time"), class = "data.frame", row.names = c(NA, 
-5L))

Answer 1

使用dplyr，filter time的{{1}}值小于Event为1的最小值ID：

library(dplyr)

df %>% group_by(ID) %>% filter(time <= min(time[Event == 1]))

## Source: local data frame [4 x 3]
## Groups: ID [2]
## 
##      ID Event  time
##   <int> <int> <int>
## 1     1     0     1
## 2     1     1     2
## 3     2     0     3
## 4     2     0     5

您可以将time或row_number与seq一起使用，而不是使用which。在基数R中，您可以使用ave来处理分组，但它只能处理一个输入向量，因此seq方法比使用time更简单：

df[as.logical(ave(df$Event, df$ID, FUN = function(x) {
    seq_along(x) <= min(which(x == 1))
})), ]

##   ID Event time
## 1  1     0    1
## 2  1     1    2
## 3  2     0    3
## 5  2     0    5

这两种方法都取决于min(integer(0))在Inf没有1值时返回ID这一事实，但添加if条件以明确说明情况，如果你愿意的话。

Answer 2

以下是match使用data.table

的一个选项

library(data.table)
setDT(df)[, .SD[seq_len(match(1, Event, nomatch = .N))], ID]
#   ID Event time
#1:  1     0    1
#2:  1     1    2
#3:  2     0    3
#4:  2     0    5

何我在活动结束后删除数据？

2 个答案: