根据dplyr中的条件对两列进行分组

时间:2017-11-07 18:51:53

标签: r dplyr

    Success    modified_date user_id                 description
     <int>            <chr>     <int>                        <chr>
 1       0 10/15/2015 13:12    158236                   Phone Live
 2       0 10/15/2015 13:21    158236                   Phone Live
 3       1 10/25/2015 20:11    240497                   Phone Live
 4       1 11/24/2015 17:05    240497                   Phone Live
 5       1  6/23/2015 10:40    240497                   Phone Live
 6       1    7/7/2015 8:59    240497                   Phone Live
 7       0   5/1/2015 11:00    243412                   Phone Live
 8       0   5/1/2015 11:00    243412                   Phone Live
 9       0   6/11/2016 9:19    289273                      Webform
10       1   6/11/2016 9:23    289273                      Webform

查看分组成功和user_id列,条件是,如果成功值从0更改为1,则user_id描述需要显示转​​换。如果user_id的成功值没有改变,那么没有任何改变。

所需的输出:

    Success    modified_date user_id                 description
     <int>            <chr>     <int>                        <chr>
 1       0 10/15/2015 13:12    158236                   Phone Live
 2       0 10/15/2015 13:21    158236                   Phone Live
 3       1 10/25/2015 20:11    240497                   Phone Live
 4       1 11/24/2015 17:05    240497                   Phone Live
 5       1  6/23/2015 10:40    240497                   Phone Live
 6       1    7/7/2015 8:59    240497                   Phone Live
 7       0   5/1/2015 11:00    243412                   Phone Live
 8       0   5/1/2015 11:00    243412                   Phone Live
 9       0   6/11/2016 9:19    289273                   Webform;Webform

这是代码:

time_data3 = time_data2 %>% arrange(user_id, modified_date, Success) %>% 
  filter(user_id != 0) %>% group_by(Success, user_id)%>%
  summarize(sequence = paste(description, collapse = ";"))

dput

structure(list(Success = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 
1L), modified_date = c("10/15/2015 13:12", "10/15/2015 13:21", 
"10/25/2015 20:11", "11/24/2015 17:05", "6/23/2015 10:40", "7/7/2015 8:59", 
"5/1/2015 11:00", "5/1/2015 11:00", "6/11/2016 9:19", "6/11/2016 9:23"
), user_id = c(158236L, 158236L, 240497L, 240497L, 240497L, 
240497L, 243412L, 243412L, 289273L, 289273L), description = c("Phone Live", 
"Phone Live", "Phone Live", "Phone Live", "Phone Live", "Phone Live", 
"Phone Live", "Phone Live", "Webform", "Webform")), .Names = c("Success", 
"modified_date", "user_id", "description"), row.names = c(NA, 
-10L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "user_id", drop = TRUE, indices = list(
    0:1, 2:5, 6:7, 8:9), group_sizes = c(2L, 4L, 2L, 2L), biggest_group_size = 4L, labels = structure(list(
    user_id = c(158236L, 240497L, 243412L, 289273L)), row.names = c(NA, 
-4L), class = "data.frame", vars = "user_id", drop = TRUE, .Names = "user_id"))

1 个答案:

答案 0 :(得分:1)

df %>% group_by(user_id) %>%
  group_by(user_id,x =cumsum(c(TRUE,Success[-1] == Success[-length(Success)]))) %>%
  summarize(Success=Success[1],
            modified_date=modified_date[1],
            description=paste(description,collapse=";")) %>%
  select(-x)

# # A tibble: 9 x 4
# # Groups:   user_id [4]
#   user_id Success    modified_date     description
#     <int>   <int>            <chr>           <chr>
# 1  158236       0 10/15/2015 13:12      Phone Live
# 2  158236       0 10/15/2015 13:21      Phone Live
# 3  240497       1 10/25/2015 20:11      Phone Live
# 4  240497       1 11/24/2015 17:05      Phone Live
# 5  240497       1  6/23/2015 10:40      Phone Live
# 6  240497       1    7/7/2015 8:59      Phone Live
# 7  243412       0   5/1/2015 11:00      Phone Live
# 8  243412       0   5/1/2015 11:00      Phone Live
# 9  289273       0   6/11/2016 9:19 Webform;Webform

如果成功按组稳定,我们计算的向量为TRUE,因此如果我们收集它,则常数值显示有变化,我们按此值分组并汇总。