问题:假设我有下面的data.table
对象。我只想保留满足以下条件的条目:
CURRENT_DATE
和IID
,如果在该日期state = final_e
上已经有state = inital_e
,则仅在行中保留IID
。 >
CURRENT_DATE
和IID
,如果有state = e
,它们将不受影响并保留在数据中任何建议如何做到这一点,以使我得到所需的对象?非常感谢!
library(data.table)
dt <- data.table(
CURRENT_DATE = c("2020-01-01", "2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02", "2020-01-02"),
IID = c(1, 1, 2, 1, 2, 2),
state = c("init_e", "final_e", "e", "e", "init_e", "final_e"),
vals = c(10, 20, 30, 22, 9, 7),
text = c("some_text1", "some_text2", "some_text3", "some_text4", "some_text5", "some_text6")
)
## Output:
CURRENT_DATE IID state vals text
1: 2020-01-01 1 init_e 10 some_text1
2: 2020-01-01 1 final_e 20 some_text2
3: 2020-01-01 2 e 30 some_text3
4: 2020-01-02 1 e 22 some_text4
5: 2020-01-02 2 init_e 9 some_text5
6: 2020-01-02 2 final_e 7 some_text6
## Desired Output:
CURRENT_DATE IID state vals text
1: 2020-01-01 1 final_e 20 some_text2
2: 2020-01-01 2 e 30 some_text3
3: 2020-01-02 1 e 22 some_text4
4: 2020-01-02 2 final_e 7 some_text6
编辑:
library(data.table)
dt2 <- data.table(
CURRENT_DATE = c("2020-01-01", "2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02"),
IID = c(1, 1, 2, 1, 2),
state = c("init_e", "final_e", "e", "e", "final_e"),
vals = c(10, 20, 30, 22, 7),
text = c("some_text1", "some_text2", "some_text3", "some_text4", "some_text5")
)
## Output:
CURRENT_DATE IID state vals text
1: 2020-01-01 1 init_e 10 some_text1
2: 2020-01-01 1 final_e 20 some_text2
3: 2020-01-01 2 e 30 some_text3
4: 2020-01-02 1 e 22 some_text4
5: 2020-01-02 2 final_e 7 some_text5
使用这些数据,答案之一将导致
setorder(dt2[, rn := .I], CURRENT_DATE, IID, state)
dt2[sort(c(dt2[state=="e", which=TRUE],
unique(dt2[state %chin% c("final_e","init_e")], by=c("CURRENT_DATE","IID"))$rn))]
## Output:
CURRENT_DATE IID state vals text rn
1: 2020-01-01 1 init_e 10 some_text1 1
2: 2020-01-01 2 e 30 some_text3 3
3: 2020-01-02 1 e 22 some_text4 4
4: 2020-01-02 2 final_e 7 some_text5 5
## Desired Output:
CURRENT_DATE IID state vals text
1: 2020-01-01 1 final_e 20 some_text2
3: 2020-01-01 2 e 30 some_text3
4: 2020-01-02 1 e 22 some_text4
5: 2020-01-02 2 final_e 7 some_text5
答案 0 :(得分:2)
这是另一种选择:
setkey(dt, CURRENT_DATE, IID, state)[, rn := .I]
dt[sort(c(dt[state=="e", which=TRUE],
unique(dt[state %chin% c("final_e","init_e")], by=c("CURRENT_DATE","IID"))$rn))]
或者仅基于小型样本数据集:
dt[state!="init_e"]
答案 1 :(得分:0)
我们可以编写一个自定义函数:
check_condition <- function(state) {
if (any(state == "init_e")) which(state == 'final_e')
else if(state == 'e') which(state == 'e')
}
并将其应用于每个组。
library(data.table)
dt[, .SD[check_condition(state)], .(CURRENT_DATE, IID)]
# CURRENT_DATE IID state vals text
#1: 2020-01-01 1 final_e 20 some_text2
#2: 2020-01-01 2 e 30 some_text3
#3: 2020-01-02 1 e 22 some_text4
#4: 2020-01-02 2 final_e 7 some_text6
答案 2 :(得分:0)
让我也回答我自己的问题,因为我找到了一个漂亮的(显而易见的)解决方案:
CURRENT_DATE, IID
)state
变量编码为有序因子dt2[, state := factor(state, levels = c("final_e", "init_e", "e"),
ordered = TRUE)]
sorted_frame <- dt2[order(CURRENT_DATE, IID, state)]
u_frame <- unique(sorted_frame, by = c("CURRENT_DATE", "IID"))