在向量中查找与向量b中的元素位于同一列表中的元素

时间:2018-04-12 12:29:28

标签: r vector intersection

我列出了3个将事物分类为水果,车辆和鲜花的清单。

category <-
  structure(
    list(
      fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
      vehicles = c("car", "bike", "motorbike", "train", "plane"),
      flowers <- list("rose", "tulip", "sunflower")
    ),
    .Names = c(
      "fruits", "vehicles", "flowers"
    )
  )

然后我有一个包含2个向量的数据框,其中包含列表中的元素。向量a每个单元格可以包含任意数量的对象,向量b每个单元格只有一个元素。

a <- I(list(c("apple", "car"), 
        c("motorbike", "banana", "tulip"), 
        c("rose", "kiwi", "apple"), 
        c("bike", "sunflower", "lemon"), 
        c("orange"), 
        c("tulip", "pear")))
b <- c("motorbike", "pear", "sunflower", "orange", "car", "apple")
funnydata <- data.frame(a, b)

我想创建第三个向量,它给出向量a中的元素与向量b中的元素在同一列表/类别中。所以期望的结果将是

             a         b      c
1   apple, car motorbike    car
2 motorbik....      pear banana
3 rose, ki.... sunflower   rose
4 bike, su....    orange  lemon
5       orange       car     NA
6  tulip, pear     apple   pear

只要我将列表固定下来,我设法将矢量中的元素放在特定列表中:

funnydata$c <- sapply(funnydata$a, function(x) intersect(fruits, unlist(x))) # fixed list

funnydata$c
[[1]]
[1] "apple"

[[2]]
[1] "banana"

[[3]]
[1] "apple" "kiwi" 

[[4]]
[1] "lemon"

[[5]]
[1] "orange"

[[6]]
[1] "pear"

我还可以指定列表b在:

sapply(funnydata$b, function(y) names(category[grep(y, category) ]))

[1] "vehicles" "fruits"   "flowers"  "fruits"   "vehicles" "fruits"

但我坚持将两者结合起来。如果我尝试

,我会得到所有character(0)
funnydata$c <- sapply(funnydata$a, function(x) intersect(sapply(funnydata$b, function(y) 
  category[grep(y, category) ]), unlist(x)))

有人可以帮忙吗?

修改

我发现原始帖子中有一个错误:category中的对象都应该是相同的类型(向量或列表,无论哪个更符合需求)。所以它应该是:

category <-
  structure(
    list(
      fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
      vehicles = c("car", "bike", "motorbike", "train", "plane"),
      flowers = c("rose", "tulip", "sunflower")
    ),
    .Names = c(
      "fruits", "vehicles", "flowers"
    )
  )

不知道这是否会改变现有答案的内容。我还在试图把我的思绪包裹起来。如果这个复制粘贴错误使事情变得比以前更加复杂,我很抱歉。

2 个答案:

答案 0 :(得分:2)

我们可以通过加入

来做到这一点
library(tidyverse)
dat <-  rownames_to_column(funnydata, 'rn')
catdat <- stack(category)  
dat %>% 
   unnest %>% 
   left_join(catdat, by = c(a = "values")) %>%
   left_join(catdat, by = c(b = "values")) %>%
   filter(ind.x == ind.y) %>% 
   select(rn, c=a) %>% 
   right_join(dat) %>%
   select(names(funnydata), c)
#            a         b      c
#1   apple, car motorbike    car
#2 motorbik....      pear banana
#3 rose, ki.... sunflower   rose
#4 bike, su....    orange  lemon
#5       orange       car   <NA>
#6  tulip, pear     apple   pear

答案 1 :(得分:2)

有关带有列表列的data.frames的大多数问题可以通过将这些列表列转换为“平面”向量来解决。

因此我们将两个原始data.frames转换为更长版本:

category_df <- data.frame(
  group  = rep(names(category), times = lengths(category)),
  member = unlist(category)
)

category_df
#              group    member
# fruits1     fruits     apple
# fruits2     fruits    banana
# fruits3     fruits      pear
# fruits4     fruits     lemon
# fruits5     fruits      kiwi
# fruits6     fruits    orange
# vehicles1 vehicles       car
# vehicles2 vehicles      bike
# vehicles3 vehicles motorbike
# vehicles4 vehicles     train
# vehicles5 vehicles     plane
# flowers1   flowers      rose
# flowers2   flowers     tulip
# flowers3   flowers sunflower

funnydata[["index"]] <- seq_len(nrow(funnydata))
funny_flat <- data.frame(
  a     = unlist(funnydata[["a"]]),
  b     = rep(funnydata[["b"]], times = lengths(funnydata[["a"]])),
  index = rep(funnydata[["index"]], times = lengths(funnydata[["a"]]))
)

funny_flat
#            a         b index
# 1      apple motorbike     1
# 2        car motorbike     1
# 3  motorbike      pear     2
# 4     banana      pear     2
# 5      tulip      pear     2
# 6       rose sunflower     3
# 7       kiwi sunflower     3
# 8      apple sunflower     3
# 9       bike    orange     4
# 10 sunflower    orange     4
# 11     lemon    orange     4
# 12    orange       car     5
# 13     tulip     apple     6
# 14      pear     apple     6

我还添加了一个索引,因此我们知道哪些值来自哪些原始行。现在只需进行一些简单的合并,并进行一些重命名。

funny_flat <- merge(funny_flat, category_df, by.x = "a", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_a"

funny_flat <- merge(funny_flat, category_df, by.x = "b", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_b"

funny_flat
#            b         a index  group_a  group_b
# 1      apple      pear     6   fruits   fruits
# 2      apple     tulip     6  flowers   fruits
# 3        car    orange     5   fruits vehicles
# 4  motorbike     apple     1   fruits vehicles
# 5  motorbike       car     1 vehicles vehicles
# 6     orange      bike     4 vehicles   fruits
# 7     orange     lemon     4   fruits   fruits
# 8     orange sunflower     4  flowers   fruits
# 9       pear motorbike     2 vehicles   fruits
# 10      pear    banana     2   fruits   fruits
# 11      pear     tulip     2  flowers   fruits
# 12 sunflower     apple     3   fruits  flowers
# 13 sunflower      rose     3  flowers  flowers
# 14 sunflower      kiwi     3   fruits  flowers

现在,我们将对您的原始目标进行编码:查找ab共享类别的值。 c将是a的值,因此也只是重命名。

funny_matching <- funny_flat[funny_flat[["group_a"]] == funny_flat[["group_b"]], ]
names(funny_matching)[names(funny_flat) == "a"] <- "c"
funny_matching
#            b      c index  group_a  group_b
# 1      apple   pear     6   fruits   fruits
# 5  motorbike    car     1 vehicles vehicles
# 7     orange  lemon     4   fruits   fruits
# 10      pear banana     2   fruits   fruits
# 13 sunflower   rose     3  flowers  flowers

再次,使用之前的索引进行合并。

merge(
  funnydata,
  funny_matching[, c("c", "index")],
  by = "index",
  all.x = TRUE
)
#   index            a         b      c
# 1     1   apple, car motorbike    car
# 2     2 motorbik....      pear banana
# 3     3 rose, ki.... sunflower   rose
# 4     4 bike, su....    orange  lemon
# 5     5       orange       car   <NA>
# 6     6  tulip, pear     apple   pear