Question

我有一个数据框，dat：

dat<-data.frame(col1=rep(1:4,3),
                col2=rep(letters[24:26],4),
                col3=letters[1:12])

我想仅使用数据框dat中的行给出的组合，在两个不同的列上过滤filter：

filter<-data.frame(col1=1:3,col2=NA)
lists<-list(list("x","y"),list("y","z"),list("x","z"))
filter$col2<-lists

因此，例如，将选择包含（1，x）和（1，y）的行，但不选择（1，z），（2，x）或（3，y）。

我知道如何使用for循环来实现它：

#create a frame to drop results in
results<-dat[0,]
for(f in 1:nrow(filter)){
  temp_filter<-filter[f,]
  temp_dat<-dat[dat$col1==temp_filter[1,1] &
                dat$col2%in%unlist(temp_filter[1,2]),]
  results<-rbind(results,temp_dat)
}

或者如果您更喜欢dplyr样式：

require(dplyr)
results<-dat[0,]
for(f in 1:nrow(filter)){
  temp_filter<-filter[f,]
  temp_dat<-filter(dat,col1==temp_filter[1,1] & 
  col2%in%unlist(temp_filter[1,2])
  results<-rbind(results,temp_dat)
}

结果应该返回

  col1 col2 col3
1    1    x    a
5    1    y    e
2    2    y    b
6    2    z    f
3    3    z    c
7    3    x    g

我通常会使用合并进行过滤，但我现在不能，因为我必须针对列表而不是单个值检查col2。 for循环有效，但我认为有一种更有效的方法可以做到这一点，可能使用apply或do.call的某些变体。

Answer 1

使用tidyverse的解决方案。 dat2是最终输出。我们的想法是从filter数据框的列表列中提取值。将filter数据框转换为filter2格式，col1和col2列在dat数据框中具有相同的组件。最后，使用semi_join过滤dat来创建dat2。

顺便说一下，filter是dplyr包中的预定义函数。在您的示例中，您使用了dplyr包，因此最好避免将数据框命名为filter。

library(tidyverse)

filter2 <- filter %>%
  mutate(col2_a = map_chr(col2, 1),
         col2_b = map_chr(col2, 2)) %>%
  select(-col2) %>%
  gather(group, col2, -col1)

dat2 <- dat %>%
  semi_join(filter2, by = c("col1", "col2")) %>%
  arrange(col1)
dat2
  col1 col2 col3
1    1    x    a
2    1    y    e
3    2    y    b
4    2    z    f
5    3    z    c
6    3    x    g

更新

另一种准备filter2包的方法，它不需要知道每个列表中有多少个元素。其余部分与之前的解决方案相同。

library(tidyverse)

filter2 <- filter %>%
  rowwise() %>%
  do(data_frame(col1 = .$col1, col2 = flatten_chr(.$col2)))

dat2 <- dat %>%
  semi_join(filter2, by = c("col1", "col2")) %>%
  arrange(col1)

Answer 2

将filter列表恢复为标准data.frame后，这可以通过直接加入实现：

merge(
  dat,
  with(filter, data.frame(col1=rep(col1, lengths(col2)), col2=unlist(col2)))
)

#  col1 col2 col3
#1    1    x    a
#2    1    y    e
#3    2    y    b
#4    2    z    f
#5    3    x    g
#6    3    z    c

可以说，我首先要废除创建这些嵌套列表的任何过程。

从另一个数据帧给出的列表组合中选择数据框中的行

2 个答案:

更新