通过R中的另一列和第三列的成对组合计算列的唯一值

时间:2017-02-13 22:33:14

标签: r dataframe data.table

说实话,这是一项相当复杂的任务。它基本上是我之前提出的问题的延伸 - Count unique values of a column by pairwise combinations of another column in R

让我们说这一次,我在R中有以下数据框:

data.frame(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"))

数据看起来像这样 -

      Reg.ID Location Product
1      1        X       A
2      1        X       B
3      2        Y       A
4      2        Y       B
5      2        Y       C
6      3        X       B
7      3        X       A

我想通过“Product”列中的值的成对组合计算“Reg.ID”列的唯一值,按“Location”列分组。结果应如下所示 -

  Location Prod.Comb Count
1        X       A,B     2
2        Y       A,B     1
3        Y       A,C     1
4        Y       B,C     1

我尝试使用基本R函数获取输出,但没有取得任何成功。我猜在R?

中使用data.table包有一个相当简单的解决方案

非常感谢任何帮助。谢谢!

2 个答案:

答案 0 :(得分:6)

没有太多经过考验的想法,但这是def safe_deallocate(self, statement_name): curs.execute( "select true from pg_prepared_statements where name = lower(%s)", (statement_name,)) if curs.rowcount: curs.execute("deallocate {}".format(statement_name)) 首先想到的:

data.table

答案 1 :(得分:2)

dplyr解决方案,抄袭您提到的问题:

library(dplyr)

df <- data.frame(Reg.ID = c(1,1,2,2,2,3,3), 
                 Location = c("X","X","Y","Y","Y","X","X"), 
                 Product = c("A","B","A","B","C","B","A"),
                 stringsAsFactors = FALSE)

df %>%
  full_join(df, by="Location") %>%
  filter(Product.x < Product.y) %>%
  group_by(Location, Product.x, Product.y) %>%
  summarise(Count = length(unique(Reg.ID.x))) %>%
  mutate(Prod.Comb = paste(Product.x, Product.y, sep=",")) %>%
  ungroup %>%
  select(Location, Prod.Comb, Count) %>%
  arrange(Location, Prod.Comb)

# # A tibble: 4 × 3
#   Location Prod.Comb Count
#      <chr>     <chr> <int>
# 1        X       A,B     2
# 2        Y       A,B     1
# 3        Y       A,C     1
# 4        Y       B,C     1