汇总r中data.frame的多个变量的数据?

时间:2020-03-31 16:08:44

标签: r dataframe tidyverse quantile

我正在尝试在自己感兴趣的时间段内计算unsigned char A[6]; unsigned char C[2]; unsigned char B[6]; A[0] = 14; A[1] = 5; A[2] = 2; A[3] = 12; A[4] = 228; A[5] = 151; for (size_t k = 0; k < 8; k++) { for (int l = 0; l < 6; ++l) { B[l] = 0; } C[0] = 0; C[1] = 0; for (int i = 0; i < 6; i++) { C[(A[i] >> k) & 1]++; } C[1] = C[1] + C[0]; for (int j = 0; j < 6; ++j) { B[--C[(A[j] >> k) & 1] ] = A[j]; } swap(A, B); } 中两个quartile中的最高variables和最低data.frame。下面的代码给了我上位数和下位数的一位数字。

    set.seed(50)
FakeData <- data.frame(seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
                     A = runif(1095, 0,10),
                     D = runif(1095,5,15))
    colnames(FakeData) <- c("Date", "A","D")
    statistics <- FakeData %>% 
              gather(-Date, key = "Variable", value = "Value") %>% 
              mutate(Year = year(Date), Month = month(Date)) %>% 
              filter(between(Month,3,5)) %>% 
              mutate(NewDate = ymd(paste("2020", Month,day(Date), sep = "-"))) %>%
              group_by(Variable, NewDate) %>%
              summarise(Upper = quantile(Value,0.75, na.rm = T),
                        Lower = quantile(Value, 0.25, na.rm = T))

我想要类似下面的输出(Final_output是我感兴趣的内容)

Output1 <- data.frame(seq(as.Date("2000-03-01"), to= as.Date("2000-05-31"), by="day"),
                       Upper = runif(92, 0,10), lower = runif(92,5,15), Variable = rep("A",92))
colnames(Output1)[1] <- "Date"
Output2 <- data.frame(seq(as.Date("2000-03-01"), to= as.Date("2000-05-31"), by="day"),
                      Upper = runif(92, 2,10), lower = runif(92,5,15), Variable = rep("D",92))
colnames(Output2)[1] <- "Date"
Final_Output<- bind_rows(Output1,Output2)

1 个答案:

答案 0 :(得分:1)

我可以为您提出data.table解决方案。实际上,有几种方法可以做到这一点。

最后的步骤(在Value变量上按组应用四分位数)可以转换为(如果需要,如您的示例中的两列):

statistics[,.('p25' = quantile(get('Value'), probs = 0.25), 'p75' = quantile(get('Value'), probs = 0.75)),
           by = c("Variable", "NewDate")]

如果您喜欢长格式的输出:

library(data.table)
setDT(statistics)

statistics[,.(lapply(get('Value'), quantile, probs = .25,.75)) ,
by = c("Variable", "NewDate")]

所有步骤一起

如果您选择使用data.table使用data.table动词执行所有步骤,可能会更好。我将假设您的数据具有与您生成和排列的数据框相似的结构,即

statistics <- FakeData %>% 
  gather(-Date, key = "Variable", value = "Value") 

在这种情况下,mutatefilter步骤将变为

statistics[,`:=`(Year = year(Date), Month = month(Date))]
statistics <- statistics[Month %between% c(3,5)]
statistics[, NewDate = :ymd(paste("2020", Month,day(Date), sep = "-"))]

然后选择您喜欢的最后一步,例如

statistics[,.('p25' = quantile(get('Value'), probs = 0.25), 'p75' = quantile(get('Value'), probs = 0.75)),
           by = c("Variable", "NewDate")]
相关问题