从数据框中有条件地删除行

时间:2016-11-08 05:17:26

标签: r

如何从数据表中有条件地删除行?

例如,我有:

Apple, 2001
Apple, 2002
Apple, 2003
Apple, 2004
Banana, 2001
Banana, 2002
Banana, 2003
Candy, 2001
Candy, 2002
Candy, 2003
Candy, 2004
Dog, 2001
Dog, 2002
Dog, 2004
Water, 2002
Water, 2003
Water, 2004

然后,我想只包括每组2001-2004的行,即:

Apple, 2001
Apple, 2002
Apple, 2003
Apple, 2004
Candy, 2001
Candy, 2002
Candy, 2003
Candy, 2004

3 个答案:

答案 0 :(得分:3)

使用data.table,检查if所有2001:2004是否存在%in%'年'每组' Col1'的列,然后获取Data.table的子集

library(data.table)
setDT(df1)[, if(all(2001:2004 %in% year)) .SD, by = Col1]
#    Col1 year
#1: Apple 2001
#2: Apple 2002
#3: Apple 2003
#4: Apple 2004
#5: Candy 2001
#6: Candy 2002
#7: Candy 2003
#8: Candy 2004

数据

df1 <- structure(list(Col1 = c("Apple", "Apple", "Apple", "Apple", "Banana", 
"Banana", "Banana", "Candy", "Candy", "Candy", "Candy", "Dog", 
"Dog", "Dog", "Water", "Water", "Water"), year = c(2001L, 2002L, 
 2003L, 2004L, 2001L, 2002L, 2003L, 2001L, 2002L, 2003L, 2004L, 
 2001L, 2002L, 2004L, 2002L, 2003L, 2004L)), .Names = c("Col1", 
 "year"), class = "data.frame", row.names = c(NA, -17L))

答案 1 :(得分:2)

使用base R,我们可以使用ave来获得所需的结果

df[ave(df$year, df$Col1, FUN = function(x) all(2001:2004 %in% x)) == 1, ]

#   Col1 year
#1  Apple 2001
#2  Apple 2002
#3  Apple 2003
#4  Apple 2004
#8  Candy 2001
#9  Candy 2002
#10 Candy 2003
#11 Candy 2004

答案 2 :(得分:2)

dplyr方法:

library(dplyr) # or library(tidyverse)
df1 %>% 
    group_by(Col1) %>% 
    filter(all(2001:2004 %in% year))

. %>% filter(TRUE)会返回所有行,而. %>% filter(FALSE)会丢弃所有数据行。

输出:

Source: local data frame [8 x 2]
Groups: Col1 [2]

   Col1  year
  <chr> <int>
1 Apple  2001
2 Apple  2002
3 Apple  2003
4 Apple  2004
5 Candy  2001
6 Candy  2002
7 Candy  2003
8 Candy  2004