我具有以下形式的数据框:
Val, Date, ID
1, 1-JAN-2019, X
2, 2-JAN-2019, X
3, 3-JAN-2019, X
4, 2-JAN-2019, A
5, 3-JAN-2019, A
6, 4-JAN-2019, A
7, 5-JAN-2019, B
我需要采用以下形式:
Date, X, A, B
1-JAN-2019, 1, NA, NA
2-JAN-2019, 2, 4, NA
3-JAN-2019, 3, 5, NA
4-JAN-2019, NA, 6, NA
5-JAN-2019, NA, NA, 7
我已经尝试过各种df %>% group_by(ID, Date) %>% summarise(col = mean(Val))
的groupby语句,并且可以使用for
循环来获得所需的结果,但是我知道for
循环不是省时的。我正在寻找某种lapply
类型的语句,它将最大限度地提高时间效率。
答案 0 :(得分:3)
这是一个base
使用tapply
的选项
t(with(df, tapply(Val, list(ID, Date), mean)))
# A B X
# 1-JAN-2019 NA NA 1
# 2-JAN-2019 4 NA 2
# 3-JAN-2019 5 NA 3
# 4-JAN-2019 6 NA NA
# 5-JAN-2019 NA 7 NA
答案 1 :(得分:2)
获得pivot_wider
d输出后,一个选项就是summarise
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
group_by(ID, Date) %>%
summarise(col = mean(Val)) %>%
ungroup %>%
pivot_wider(names_from = ID, values_from = col) %>%
arrange(dmy(Date))
# A tibble: 5 x 4
# Date A B X
# <chr> <dbl> <dbl> <dbl>
#1 1-JAN-2019 NA NA 1
#2 2-JAN-2019 4 NA 2
#3 3-JAN-2019 5 NA 3
#4 4-JAN-2019 6 NA NA
#5 5-JAN-2019 NA 7 NA
或者使用dcast
中的data.table
,我们也可以指定fun.aggregate
library(data.table)
dcast(setDT(df), Date ~ ID, value.var = 'Val', mean)
df <- structure(list(Val = 1:7, Date = c("1-JAN-2019", "2-JAN-2019",
"3-JAN-2019", "2-JAN-2019", "3-JAN-2019", "4-JAN-2019", "5-JAN-2019"
), ID = c("X", "X", "X", "A", "A", "A", "B")), row.names = c(NA,
-7L), class = "data.frame")
答案 2 :(得分:2)
从示例输入和输出来看,尚不清楚您是否需要任何summarise
。如果您的数据总体上是正确的,那么只需pivot_wider
:
library(tidyr)
df %>% pivot_wider(names_from = ID, values_from = Val)
#> # A tibble: 5 x 4
#> Date ` X` ` A` ` B`
#> <fct> <int> <int> <int>
#> 1 " 1-JAN-2019" 1 NA NA
#> 2 " 2-JAN-2019" 2 4 NA
#> 3 " 3-JAN-2019" 3 5 NA
#> 4 " 4-JAN-2019" NA 6 NA
#> 5 " 5-JAN-2019" NA NA 7
如果有人要去,我正在使用此数据:
txt <- "Val, Date, ID
1, 1-JAN-2019, X
2, 2-JAN-2019, X
3, 3-JAN-2019, X
4, 2-JAN-2019, A
5, 3-JAN-2019, A
6, 4-JAN-2019, A
7, 5-JAN-2019, B"
df <- read.table(text = txt, header = TRUE, sep = ",")
由reprex package(v0.2.1)于2019-12-10创建