汇总具有日期和因子变量的数据

时间:2019-05-13 19:44:35

标签: r

我有一个带有三个变量的长格式表; id,日期和一个因子变量。

dates <- (seq.Date(from = as.Date(c("2015-02-01")), 
                   to = as.Date(c("2016-01-01")), 
                   by = "month") - 1)

data <- data.frame("date" = rep(dates, 2), 
                    "id" = rep(c(1, 2), each = 12), 
                   "grade" = c(rep("Z", 4), rep("T", 3), rep("R", 5), 
                               rep("T", 2), rep("R", 3), rep("T", 7)))

我想得到一张这样的桌子

id     start date    fin date      grade
1      2015-01-31    2015-04-30      Z
1      2015-05-31    2015-07-31      T
1      2015-08-31    2015-12-31      R
2      2015-01-31    2015-02-28      T
2      2015-03-31    2015-05-31      R
2      2015-06-30    2015-12-31      T

我使用dplry软件包以及基本的R函数尝试了以下代码,但没有任何尝试产生我想要的结果。

1st attempt

data %>% group_by(id, grade) %>% 
        summarize(Min_val = min(date), Max_val = max(date)) 

2nd attempt

first <- with(data, by(data,  list(id, grade), head, n=1))
last <- with(data, by(data,  list(id, grade), tail, n=1))

highestd <- do.call("rbind", as.list(first))
lowestd <- do.call("rbind", as.list(last))

data.f <- cbind(highestd[, c("id", "date")], lowestd[, c("date", "grade")])
colnames(data.f) <- c("id", "start.date", "fin.date", "grade")
data.f <- data.f[order(data.f$id, data.f$start.date),]
data.f

1 个答案:

答案 0 :(得分:1)

一种dplyr可能是:

data %>%
 group_by(id, grade, rleid = with(rle(grade), rep(seq_along(lengths), lengths))) %>%
 summarise(start_date = min(date),
        fin_date = max(date)) %>%
 arrange(rleid) %>%
 ungroup() %>%
 select(-rleid)

     id grade start_date fin_date  
  <dbl> <chr> <date>     <date>    
1     1 Z     2015-01-31 2015-04-30
2     1 T     2015-05-31 2015-07-31
3     1 R     2015-08-31 2015-12-31
4     2 T     2015-01-31 2015-02-28
5     2 R     2015-03-31 2015-05-31
6     2 T     2015-06-30 2015-12-31

它只是在“成绩”列周围创建行程类型组ID。

rleid()中的data.table相同:

data %>%
 group_by(id, grade, rleid = rleid(grade)) %>%
 summarise(start_date = min(date),
        fin_date = max(date)) %>%
 arrange(rleid) %>%
 ungroup() %>%
 select(-rleid)