在tidyr中扩展作者 - 作者的所有独特组合

时间:2018-05-19 00:53:36

标签: r dplyr combinations tidyr


    df <- data.frame(author = c(2,4,8,16,32,64,128,256,512,1024),
             topic = c(101,101,101,101,301,301,501,501,501,501),
             time = c("2014-08-16 20:20:11", "2014-08-16 21:10:00", "2014-08-17 06:30:10",
                        "2014-08-17 10:08:32", "2014-08-20 22:23:01","2014-08-20 23:03:03",
                        "2014-08-25 17:05:01", "2014-08-25 19:15:10",  "2014-08-25 20:07:11",
                        "2014-08-25 23:59:59"))


test <- df %>% group_by(topic) %>% expand(nesting(author), author)
print(test, n = 20)

# A tibble: 36 x 3
# Groups:   topic [3]
topic author author1
    <dbl>  <dbl>   <dbl>
 1  101.     2.      2.
 2  101.     2.      4.
 3  101.     2.      8.
 4  101.     2.     16.
 5  101.     4.      2.
 6  101.     4.      4.
 7  101.     4.      8.
 8  101.     4.     16.
 9  101.     8.      2.
10  101.     8.      4.
11  101.     8.      8.
12  101.     8.     16.
13  101.    16.      2.
14  101.    16.      4.
15  101.    16.      8.
16  101.    16.     16.
17  301.    32.     32.
18  301.    32.     64.
19  301.    64.     32.
20  301.    64.     64.


  1. 如何删除交换的组合(例如第2行和第5行)?
  2. 对于每个组合,我想拥有属性:
    • start =最早的主题帖子(使用mutate,min = min(time))
    • duration主题(关于主题的最后一篇文章的时间减去关于主题的第一篇文章的时间,使用mutate duration = max(time) - min(time))
    • posts的计数(使用汇总)?

4 个答案:

答案 0 :(得分:0)


test <- df %>% group_by(topic) %>%
            mutate(posts=n(), start=min(time), duration=(max(time)-min(time))/3600) %>%
            expand(nesting(author), author, posts, start, duration) %>% filter(author != author1)
# A tibble: 36 x 6
# Groups:   topic [3]
   topic author author1 posts start               duration
   <dbl>  <dbl>   <dbl> <int> <dttm>                 <dbl>
 2  101.     2.      4.     4 2014-08-16 20:20:11     13.8
 3  101.     2.      8.     4 2014-08-16 20:20:11     13.8
 4  101.     2.     16.     4 2014-08-16 20:20:11     13.8
 5  101.     4.      2.     4 2014-08-16 20:20:11     13.8
 7  101.     4.      8.     4 2014-08-16 20:20:11     13.8
 8  101.     4.     16.     4 2014-08-16 20:20:11     13.8
 9  101.     8.      2.     4 2014-08-16 20:20:11     13.8
10  101.     8.      4.     4 2014-08-16 20:20:11     13.8
# ... with 26 more rows


答案 1 :(得分:0)



dplyr groupby combn上存在许多现有问题,您可以通过简单的搜索找到它们。

尝试发布工作代码但我不太了解tidyr,我尝试过的所有内容都没有工作或语法错误。 expand想要一个数据帧,然后引用变量。因此%>% expand(author, author)再次为您提供所有排列,而不仅仅是组合。 %>% complete(...)似乎毫无用处。我认为您需要使用tidyr语法在该分组级别combn上调用author。对于每个分组级别,这可能需要是一个嵌套的子语句,其中tidyr等同于do.call。

答案 2 :(得分:0)


time <- df %>% group_by(topic) %>% mutate(posts = n(), start = min(time), duration = (max(time) - min(time))/3600) %>% distinct(topic,start,duration)
combo <- df %>% group_by(topic) %>% do(data.frame(t(combn(.$author,2))))
edges <- right_join(combo, time)

# A tibble: 13 x 5
# Groups:   topic [?]
   topic    X1    X2 start               duration         
   <dbl> <dbl> <dbl> <dttm>              <time>           
 1  101.    2.    4. 2014-08-16 20:20:11 13.8058333333333 
 2  101.    2.    8. 2014-08-16 20:20:11 13.8058333333333 
 3  101.    2.   16. 2014-08-16 20:20:11 13.8058333333333 
 4  101.    4.    8. 2014-08-16 20:20:11 13.8058333333333 
 5  101.    4.   16. 2014-08-16 20:20:11 13.8058333333333 
 6  101.    8.   16. 2014-08-16 20:20:11 13.8058333333333 
 7  301.   32.   64. 2014-08-20 22:23:01 0.667222222222222
 8  501.  128.  256. 2014-08-25 17:05:01 6.91611111111111 
 9  501.  128.  512. 2014-08-25 17:05:01 6.91611111111111 
10  501.  128. 1024. 2014-08-25 17:05:01 6.91611111111111 
11  501.  256.  512. 2014-08-25 17:05:01 6.91611111111111 
12  501.  256. 1024. 2014-08-25 17:05:01 6.91611111111111 
13  501.  512. 1024. 2014-08-25 17:05:01 6.91611111111111

答案 3 :(得分:0)


df <- data.frame(author_id = c(2,4,8,16,32,16,128,256,512,8),
             topic_id = c(101,101,101,101,301,301,501,501,501,501),
             time = as.POSIXct(c("2014-08-16 20:20:11", "2014-08-16 21:10:00", "2014-08-17 06:30:10",
                                 "2014-08-17 10:08:32", "2014-08-20 22:23:01","2014-08-20 23:03:03",
                                 "2014-08-25 17:05:01", "2014-08-25 19:15:10",  "2014-08-25 20:07:11",
                                 "2014-08-25 23:59:59")))


node <- df %>% distinct(author_id, vendor) %>% rename(id = author_id) 


edge <- df %>% group_by(topic_id) %>% do(data.frame(getall(iterpc(table(.$author_id), 2, replace =TRUE)))) %>%
 filter(X1 != X2) %>% rename(from = X1, to = X2) %>% select(to, from, topic_id)


test_net <- graph_from_data_frame(d = edge, directed = F, vertices = node)