如何计算不同群体之间的共同价值?

时间:2018-12-13 06:20:53

标签: r dplyr igraph

我正在尝试使用 igraph 包创建用于创建网络图表的数据框。我有示例数据“ mydata_data”,我想创建“ expected_data”。

我可以轻松计算出访问特定商店的客户数量,但是如何计算去x1和x2等商店的普通客户集。

我有500多家商店,所以我不想手动创建列。下面给出了可重复使用的示例数据:

mydata_data<-data.frame(
  Customer_Name=c("A","A","C","D","D","B"),
  Store_Name=c("x1","x2","x2","x2","x3","x1"))

expected_data<-data.frame(
 Store_Name=c("x1","x2","x3","x1_x2","x2_x3","x1_x3"), 
 Customers_Visited=c(2,3,1,1,1,0))

5 个答案:

答案 0 :(得分:2)

通过dplyr的另一种可能的解决方案是为每个客户创建一个包含所有组合的列表,将该列表嵌套,计数并与具有所有组合的数据框合并,即

library(tidyverse)

df %>%
    group_by(Customer_Name) %>%
    summarise(combos = list(unique(c(unique(Store_Name), paste(unique(Store_Name), collapse = '_'))))) %>%
    unnest() %>%
    group_by(combos) %>%
    count() %>%
    right_join(data.frame(combos = c(unique(df$Store_Name), combn(unique(df$Store_Name), 2, paste, collapse = '_'))))

给出,

# A tibble: 6 x 2
# Groups:   combos [?]
  combos     n
  <chr>  <int>
1 x1         2
2 x2         3
3 x3         1
4 x1_x2      1
5 x1_x3     NA
6 x2_x3      1

注意::请确保您的Store_Name变量是字符 NOT 因素,否则combn()将失败

答案 1 :(得分:2)

这是一种igraph的方法:

A <- as.matrix(as_adj(graph_from_edgelist(as.matrix(mydata_data), directed = FALSE)))
stores <- as.character(unique(mydata_data$Store_Name))
storeCombs <- t(combn(stores, 2))

data.frame(Store_Name = c(stores, apply(storeCombs, 1, paste, collapse = "_")),
           Customers_Visited = c(colSums(A)[stores], (A %*% A)[storeCombs]))
#   Store_Name Customers_Visited
# 1         x1                 2
# 2         x2                 3
# 3         x3                 1
# 4      x1_x2                 1
# 5      x1_x3                 0
# 6      x2_x3                 1

说明:A是相应的无向图的邻接矩阵。 stores就是

stores
# [1] "x1" "x2" "x3"

同时

storeCombs
#      [,1] [,2]
# [1,] "x1" "x2"
# [2,] "x1" "x3"
# [3,] "x2" "x3"

然后,主要技巧是如何获取Customers_Visited:前三个数字只是stores的邻居的对应数量,而我们从共同图邻居获得的共同客户(从A的平方开始)。

答案 2 :(得分:1)

这是获取数据的一种可能方法

这里是适应函数的形式:Generate all combinations, of all lengths, in R, from a vector

comball <- function(x) do.call("c", lapply(seq_along(x), function(i) combn(as.character(x), i, FUN = list)))

然后您可以将其与一些整齐的诗歌功能结合使用

library(dplyr)
library(purrr)
library(tidyr)

mydata_data %>% 
  group_by(Customer_Name) %>% 
  summarize(visits = list(comball(Store_Name))) %>% 
  mutate(visits = map(visits, ~map_chr(., ~paste(., collapse="_")))) %>% 
  unnest(visits) %>% 
  count(visits)

答案 3 :(得分:1)

另一种选择,其{:{1}} R:

获取所有可能的商店的列表

base

找到1个或2个商店的不同组合:

all_stores <- as.character(unique(mydata_data$Store_Name))

对于合并的每个商店数量,获取同时访问了这两家商店的客户数量,然后将此值与商店名称合并在all_comb_store <- lapply(1:2, function(n) combn(all_stores, n))

data.frame

答案 4 :(得分:1)

使用 dplyr :自我加入,然后进行分组并获得唯一计数。与考虑所有组合的其他答案相比,这应该快得多。

注意:它不显示不存在的对。另外,这里的 const payload = { dos: "window", did: "123456", phone: userName, password: password }; const options = { method: 'POST', mode: 'cors', headers: { 'Accept': 'application/json', 'Content-Type': 'application/jsonp', }, body: JSON.stringify(payload), }; // SEND REQUEST fetch('myUrl/API/login_staff', options).then((response) => { console.log(response); }).catch((error) => { console.log(error) }); 当然是x1_x1

x1

数据,没有任何因素

left_join(mydata_data, mydata_data, by = "Customer_Name")  %>%
  transmute(Customer_Name,
            grp = paste(pmin(Store_Name.x, Store_Name.y),
                        pmax(Store_Name.x, Store_Name.y), sep = "_")) %>% 
  group_by(grp) %>% 
  summarise(n = n_distinct(Customer_Name))

# # A tibble: 5 x 2
#   grp       n
#   <chr> <int>
# 1 x1_x1     2
# 2 x1_x2     1
# 3 x2_x2     3
# 4 x2_x3     1
# 5 x3_x3     1
相关问题