如何更有效地操作和合并嵌套列表中的data.frame?

时间:2016-11-04 16:34:57

标签: r dataframe nested-lists

我有两个data.frames列表作为自定义函数的输出,现在我打算拆分列表中的每个data.frame,我可以相应地得到嵌套列表。但是,我想操纵这个嵌套列表来进行组合并。使用嵌套列表有点棘手,我无法像我预期的那样操纵它们。有谁知道更容易和有效地完成这项任务的任何有用技巧?如何获得所需的输出?提前致谢

迷你示例:

myList_keep <- list(
  hola.keep= data.frame( from=seq(1, by=4, len=15), to=seq(3, by=4, len=15), value=sample(30, 15)),
  boo.keep = data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), value=sample(30, 20)),
  meh.keep = data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), value=sample(30, 25))
)

myList_drop <- list(
  hola.drop= data.frame( from=seq(11, by=7, len=10), to=seq(23, by=7, len=10), value=sample(15, 10)),
  boo.drop = data.frame( from=seq(18, by=5, len=12), to=seq(26, by=5, len=12), value=sample(18, 12)),
  meh.drop = data.frame( from=seq(24, by=8, len=15), to=seq(37, by=8, len=15), value=sample(30, 15))
)

我尝试拆分每个data.frame,如下所示:

splt_keep <- lapply(myList_keep, function(ele_) {
  res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})

splt_drop <- lapply(myList_keep, function(ele_) {
  res <- split(ele_, ifelse(ele_$value >=10, "above", "below"))
})

我打算以这种方式操作嵌套列表:

例如,如果我可以有效地操作splt_keep,splt_drop,那么我可以获得嵌套列表的这个骨架:

$hola.above
 $hola.keep$above
 $hola.drop$above

$hola.below
 $hola.keep$below
 $hola.drop$below

然后,在我得到这种格式之后,我打算相应地合并它们,所以最终的输出格式是:

    $hola
     $hola.above
     $hola.below

   $boo
     $boo.above
     $boo.below

   $meh
     $meh.above
     $meh.below

如何轻松获得所需的输出?如何操纵嵌套列表更舒适的方式?谁能指出我如何实现这一目标?

1 个答案:

答案 0 :(得分:2)

list是非常低效的结构,可以为结构良好的数据分割/绑定operatiosn。这是一个使用data.table的选项:

##  I transform lists to a unique data.table
##  note that setting idcol=TRUE will create 
## a new id column to distinguish the origin of each list
library(data.table)
keep_dt <- rbindlist(myList_keep,idcol=TRUE)
drop_dt <- rbindlist(myList_drop,idcol=TRUE)
DT <- rbind(keep_dt,drop_dt)
## Then I create the new group factor
DT[,gr := ifelse(value>10,"above","below"),.id]
## then to get the "hola" , I just filter the whole tabale 
## and I split by the other factor to get the expected output
split(DT[grepl("hola",.id)],DT$gr)

更新

获得预期的输出:

DT[,.id:= gsub("[.](keep|drop)","",.id)]
by(DT,DT$.id,FUN = function(x)split(x,x$gr))