R数据转换 - 列到行和聚合

时间:2017-03-13 11:42:12

标签: r aggregate multiple-columns rows split-apply-combine

我正在努力进行R中的数据转换。我收到的数据属于这种类型:

input <- data.frame(AF = sample(0:1, 100, replace=TRUE),
                CAD = sample(0:1, 100, replace=TRUE),
                CHF = sample(0:1, 100, replace=TRUE),
                DEM = sample(0:1, 100, replace=TRUE),
                DIAB = sample(0:1, 100, replace=TRUE))
input$Counts <- rowSums(input)

我想要实现的输出是:

output <- data.frame(Condition = c('AF', 'CAD', 'CHF', 'DEM', 'DIAB'),
                 '1' = sample(11:20, 5, replace=TRUE),
                 '2' = sample(11:20, 5, replace=TRUE),
                 '3' = sample(11:20, 5, replace=TRUE),
                 '4' = sample(11:20, 5, replace=TRUE),
                 '5' = sample(11:20, 5, replace=TRUE))

交叉点是与条件匹配的观察计数(现在在第一列中)和行总和(现在是单独的列)。

我的解决方案如下,但我想知道是否有更优雅的解决方案?

data.frame(Condition = colnames(input[ ,1:5]),
       "One" = c(nrow(input[input$AF==1 & input$Counts==1,]),
                 nrow(input[input$CAD==1 & input$Counts==1,]),
                 nrow(input[input$CHF==1 & input$Counts==1,]),
                 nrow(input[input$DEM==1 & input$Counts==1,]),
                 nrow(input[input$DIAB==1 & input$Counts==1,])),
       "Two" = c(nrow(input[input$AF==1 & input$Counts==2,]),
                 nrow(input[input$CAD==1 & input$Counts==2,]),
                 nrow(input[input$CHF==1 & input$Counts==2,]),
                 nrow(input[input$DEM==1 & input$Counts==2,]),
                 nrow(input[input$DIAB==1 & input$Counts==2,])),
       "Three" = c(nrow(input[input$AF==1 & input$Counts==3,]),
                 nrow(input[input$CAD==1 & input$Counts==3,]),
                 nrow(input[input$CHF==1 & input$Counts==3,]),
                 nrow(input[input$DEM==1 & input$Counts==3,]),
                 nrow(input[input$DIAB==1 & input$Counts==3,])),
       "Four" = c(nrow(input[input$AF==1 & input$Counts==4,]),
                 nrow(input[input$CAD==1 & input$Counts==4,]),
                 nrow(input[input$CHF==1 & input$Counts==4,]),
                 nrow(input[input$DEM==1 & input$Counts==4,]),
                 nrow(input[input$DIAB==1 & input$Counts==4,])),
       "Five" = c(nrow(input[input$AF==1 & input$Counts==5,]),
                 nrow(input[input$CAD==1 & input$Counts==5,]),
                 nrow(input[input$CHF==1 & input$Counts==5,]),
                 nrow(input[input$DEM==1 & input$Counts==5,]),
                 nrow(input[input$DIAB==1 & input$Counts==5,])),
       "Six" = c(nrow(input[input$AF==1 & input$Counts==6,]),
                 nrow(input[input$CAD==1 & input$Counts==6,]),
                 nrow(input[input$CHF==1 & input$Counts==6,]),
                 nrow(input[input$DEM==1 & input$Counts==6,]),
                 nrow(input[input$DIAB==1 & input$Counts==6,]))
)

2 个答案:

答案 0 :(得分:1)

也许您正在寻找func toCreatePayload() -> Payload { let payload: [String: [String:AnyObject]] = ["saving_rule": ["description": title as AnyObject, "amount": amount! as AnyObject, "background_color": (backgroundColor?.toHexString())! as AnyObject, "saving_rule_category_id": category!.remoteId as AnyObject, "saving_rule_sub_category_id": subCategory != nil ? subCategory!.remoteId : ("" as AnyObject), "saving_rule_condition_id": condition != nil ? condition!.remoteId : ("" as AnyObject), "saving_rule_condition_customizations_attributes": customizations.map({$0.toCreatePayload()}) as AnyObject, "suspended": "false"] as AnyObject ] return payload as [String:AnyObject] }

这是一个解决方案。

aggregate

myMat <- t(aggregate(.~Counts, data=input, FUN=sum)[-1,-1]) myMat 2 3 4 5 6 AF 3 10 15 15 2 CAD 2 14 16 18 2 CHF 2 14 18 16 2 DEM 4 8 16 18 2 DIAB 5 14 22 17 2 的第一个参数,aggregate是一个公式,表示通过Counts对每列执行某些操作。第二个参数指定数据集,第三个参数指出所需操作为. ~ Counts。使用sum从输出中删除第一列和第一列,因为它们与所需结果无关。然后使用[-1, -1]转置此输出。要更改列名称,您可以使用t之类的

colnames<-

可重现的数据

colnames(myMat) <- c("One", "Two", "Three", "Four", "Five")

答案 1 :(得分:0)

您还可以使用dplyrtidyr来切换长宽格式(尽管在这种特殊情况下,使用aggregate会更容易):

library(dplyr)
library(tidyr)

# take the input dataset
input %>%
        # transform to long format
        gather(condition, measurement,AF:DIAB) %>%
        # summarise by Counts and condition
        group_by(Counts, condition) %>%
        summarise(measure = sum(measurement)) %>%
        # transform back to the desired wide format
        spread(Counts, measure)