长到宽格式,有几个重复。通过列

时间:2018-04-18 22:28:04

标签: r dataframe multiple-columns

我有一个类似于此的数据集(真正的一个更大)。它是长格式的,我需要将其更改为宽格式,每个id一行。我的问题是时间,药物,单位和管理员有很多不同的组合。只有时间,药物,单位和管理员的组合才是唯一的,并且应该只发生一次。我找不到解决方案。我希望R创建独特的列组合,以便将数据转换为宽格式。  我试过了

melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)

还与

组合
%>% expand(nesting(time, drug, unit, admin, dose), id)

但它不起作用。这是模拟数据:

id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)

我希望我的输出是这样的(在Pre.bypass.Ace.unitNA.adminNA和Pre.OP列中的TRUE的原因是这里缺少剂量和单位但是因为它被列出它被给出标准剂量和单位:

id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)

2 个答案:

答案 0 :(得分:1)

我同意评论说长格式通常是更好的方法。如果您必须使用宽格式,请使用tidyr包,您可以执行以下操作:

library(tidyr)
df %>% 
  unite(combination, time, drug, unit, admin) %>% 
  spread(key = combination, value  = dose)

答案 1 :(得分:1)

library(data.table)
id <- c(1492, 1492, 1492, 1492, 1493, 1493)
time <- c("Pre-bypass", "Post-bypass", "Total", "Post-bypass", "Pre-OP", "Pre-OP")
drug <- c("ACE", "LEVO", "LEVO", "MIL", "BB", "BC")
unit <- c(NA, "ml/hr", "ml", "mg", NA, NA)
admin <- c(NA, "IV", "IV", "Inhale", NA, NA)
dose <- c(NA, 50, 40, 5, NA, NA)
df <- rbind(id, time, drug, unit, admin, dose)
df <- t(df)
df <- as.data.table(df)
df
#>      id        time drug  unit  admin dose
#> 1: 1492  Pre-bypass  ACE    NA     NA   NA
#> 2: 1492 Post-bypass LEVO ml/hr     IV   50
#> 3: 1492       Total LEVO    ml     IV   40
#> 4: 1492 Post-bypass  MIL    mg Inhale    5
#> 5: 1493      Pre-OP   BB    NA     NA   NA
#> 6: 1493      Pre-OP   BC    NA     NA   NA

使用data.table包函数dcast转换为wide

data.table::dcast(df, id ~ ..., value.var = "dose")
#>      id Post-bypass_LEVO_ml/hr_IV Post-bypass_MIL_mg_Inhale
#> 1: 1492                        50                         5
#> 2: 1493                        NA                        NA
#>    Pre-OP_BB_NA_NA Pre-OP_BC_NA_NA Pre-bypass_ACE_NA_NA Total_LEVO_ml_IV
#> 1:              NA              NA                   NA               40
#> 2:              NA              NA                   NA               NA