Question

我需要帮助过滤以下数据框（这是一个简单的例子）：

mx = as.data.frame(cbind(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
                         c(F, T, F, F, F, F, T, F,T)) )
colnames(mx) = c("mutation", "distance")
mx
  mutation distance
1        -    FALSE
2        -     TRUE
3        -    FALSE
4        -    FALSE
5 mutation    FALSE
6        +    FALSE
7        +     TRUE
8        +    FALSE
9        +     TRUE

我需要根据第二列（距离）进行过滤，所以它看起来像这样：

  mutation distance
3        -    FALSE
4        -    FALSE
5 mutation    FALSE
6        +    FALSE

我需要删除所有行，直到具有TRUE值的行之前的最后mx$mutation = mutation（所以行1和2），以及第一个TRUE之后的所有行发生在mx$mutation = mutation之后（因此第7行及以后）。

Answer 1

我们可以通过执行逻辑列的累积总和（＆＃39;距离＆＃39;）来创建分组变量，然后执行filter

library(dplyr)
mx %>%
  group_by(grp = cumsum(distance)) %>% 
  filter(any(mutation == "mutation") & !distance) %>%
  ungroup %>% 
  select(-grp)
# A tibble: 4 x 2
# mutation distance
#  <fctr>   <lgl>   
#1 -        F       
#2 -        F       
#3 mutation F       
#4 +        F

注意：我们可以使用data.frame直接创建data.frame。不需要cbind，它会对列的类型产生负面影响，因为cbind转换为matrix而matrix只能包含一种类型

数据

mx = data.frame(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
                      c(F, T, F, F, F, F, T, F,T))

Answer 2

希望这有帮助！

https://

输出是：

host = 'kbckjsdkcdn.us-east-1.es.amazonaws.com'

Answer 3

您可以使用which（）方法正确识别行：

# get rownum of last TRUE before df$mutation=mutation
last_true_before_mutation <- max(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') < which(mx$mutation == 'mutation')])

# get rownum of first TRUE after df$mutation=mutation
first_true_after_mutation <- min(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') > which(mx$mutation == 'mutation')])

# all rows to remove 
rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(mx)))

# remove approproate rows
mx[-rem_rows, ]

以下是您可以使用的通用功能：

before_after_mutation <- function(df) {
    last_true_before_mutation <- max(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') < which(df$mutation == 'mutation')])
    first_true_after_mutation <- min(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') > which(df$mutation == 'mutation')])
    rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(df)))
    res <- df[-rem_rows,]
    return(res)
}

<强>用法：

before_after_mutation(mx)

如何在两个特定值之间过滤行

3 个答案:

数据