R数据帧:转换数据的省时方法

时间:2019-12-10 20:11:37

标签: r dataframe

我具有以下形式的数据框:

Val,  Date,   ID
1, 1-JAN-2019, X
2, 2-JAN-2019, X
3, 3-JAN-2019, X
4, 2-JAN-2019, A
5, 3-JAN-2019, A
6, 4-JAN-2019, A
7, 5-JAN-2019, B

我需要采用以下形式:

   Date,    X,  A,  B
1-JAN-2019, 1, NA, NA
2-JAN-2019, 2,  4, NA
3-JAN-2019, 3,  5, NA
4-JAN-2019, NA, 6, NA
5-JAN-2019, NA, NA, 7

我已经尝试过各种df %>% group_by(ID, Date) %>% summarise(col = mean(Val))的groupby语句,并且可以使用for循环来获得所需的结果,但是我知道for循环不是省时的。我正在寻找某种lapply类型的语句,它将最大限度地提高时间效率。

3 个答案:

答案 0 :(得分:3)

这是一个base使用tapply的选项

t(with(df, tapply(Val, list(ID, Date), mean)))

#             A  B  X
# 1-JAN-2019 NA NA  1
# 2-JAN-2019  4 NA  2
# 3-JAN-2019  5 NA  3
# 4-JAN-2019  6 NA NA
# 5-JAN-2019 NA  7 NA

答案 1 :(得分:2)

获得pivot_wider d输出后,一个选项就是summarise

library(dplyr)
library(tidyr)
library(lubridate)
df %>% 
    group_by(ID, Date) %>%
    summarise(col = mean(Val)) %>%
     ungroup %>%
     pivot_wider(names_from = ID, values_from = col) %>%
     arrange(dmy(Date))
# A tibble: 5 x 4
#  Date           A     B     X
#  <chr>      <dbl> <dbl> <dbl>
#1 1-JAN-2019    NA    NA     1
#2 2-JAN-2019     4    NA     2
#3 3-JAN-2019     5    NA     3
#4 4-JAN-2019     6    NA    NA
#5 5-JAN-2019    NA     7    NA

或者使用dcast中的data.table,我们也可以指定fun.aggregate

library(data.table)
dcast(setDT(df), Date ~ ID, value.var = 'Val', mean)

数据

df <- structure(list(Val = 1:7, Date = c("1-JAN-2019", "2-JAN-2019", 
"3-JAN-2019", "2-JAN-2019", "3-JAN-2019", "4-JAN-2019", "5-JAN-2019"
), ID = c("X", "X", "X", "A", "A", "A", "B")), row.names = c(NA, 
-7L), class = "data.frame")

答案 2 :(得分:2)

从示例输入和输出来看,尚不清楚您是否需要任何summarise。如果您的数据总体上是正确的,那么只需pivot_wider

library(tidyr)

df %>% pivot_wider(names_from = ID, values_from = Val)
#> # A tibble: 5 x 4
#>   Date              ` X`  ` A`  ` B`
#>   <fct>            <int> <int> <int>
#> 1 "    1-JAN-2019"     1    NA    NA
#> 2 "    2-JAN-2019"     2     4    NA
#> 3 "    3-JAN-2019"     3     5    NA
#> 4 "    4-JAN-2019"    NA     6    NA
#> 5 "    5-JAN-2019"    NA    NA     7

如果有人要去,我正在使用此数据:

txt <- "Val,  Date,       ID
        1,    1-JAN-2019, X
        2,    2-JAN-2019, X
        3,    3-JAN-2019, X
        4,    2-JAN-2019, A
        5,    3-JAN-2019, A
        6,    4-JAN-2019, A
        7,    5-JAN-2019, B"

df <- read.table(text = txt, header = TRUE, sep = ",")

reprex package(v0.2.1)于2019-12-10创建

相关问题