按组计算所有成对组合的频率

时间:2019-04-09 13:31:22

标签: r dplyr

我想用item计算group的所有成对组合的频率。

have <- data.frame(group=c("a", "a", "a", 
                           "b", "b", 
                           "c",
                           "d", "d",
                           "e", "e",
                           "f", "f", "f"),
                   item=c("apple", "banana", "black cherry",
                          "apple", "black cherry",
                          "orange",
                          "banana", "black cherry",
                          "banana", "black cherry",
                          "apple", "banana", "black cherry"))

have
#    group           item
# 1      a          apple
# 2      a         banana
# 3      a   black cherry
# 4      b          apple
# 5      b   black cherry
# 6      c         orange
# 7      d         banana
# 8      d   black cherry
# 9      e         banana
# 10     e   black cherry
# 11     f          apple
# 12     f         banana
# 13     f   black cherry

# almost what I want...
# cons: repeats pairs and does not include zeros
have %>% 
# https://stackoverflow.com/a/38335011/841405
  full_join(have, by="group") %>% 
  group_by(item.x, item.y) %>% 
  summarise(length(unique(group))) %>% 
  filter(item.x!=item.y) %>%
  mutate(item = paste(item.x, item.y, sep=", "))

#         item.x       item.y  `length(unique(group))`                item                
# 1 apple        banana                             2 apple, banana       
# 2 apple        black cherry                       3 apple, black cherry 
# 3 banana       apple                              2 banana, apple       
# 4 banana       black cherry                       4 banana, black cherry
# 5 black cherry apple                              3 black cherry, apple 
# 6 black cherry banana                             4 black cherry, banana

# want I really want

#         item.x       item.y  `length(unique(group))`                item                
# 1 apple        banana                             2 apple, banana       
# 2 apple        black cherry                       3 apple, black cherry 
# 3 apple        orange                             0 apple, orange
# 4 banana       black cherry                       4 banana, black cherry
# 5 banana       orange                             0 banana, orange
# 6 black cherry orange                             0 black cherry, orange

1 个答案:

答案 0 :(得分:3)

我这样做的方法是使用expand.grid进行每种组合,然后加入已经完成的组合,然后用零填充不匹配的行。我也将您的计数重命名为n。

have2 = have %>% 
  full_join(have, by="group") %>% 
  group_by(item.x, item.y) %>% 
  summarise(n = length(unique(group))) %>% 
  filter(item.x!=item.y) %>%
  mutate(item = paste(item.x, item.y, sep=", "))

combos = expand.grid(item.x = unique(have$item),
                    item.y = unique(have$item)) %>% 
  filter(as.numeric(item.x) < as.numeric(item.y)) %>% 
  mutate(item = paste(item.x, item.y, sep = ', ')) %>% 
  arrange(item.x, item.y) %>% 
  left_join(have2) %>% 
  mutate(n = replace(n, is.na(n), 0))