将嵌套的子列表展平为data.frame

时间:2014-01-13 19:36:43

标签: r

我经常以嵌套列表的形式接收数据。我最终编写了各种代码来将这些代码展平为data.frames。我想要一个更通用的解决方案,所以我不是为每个单独的列表编写一个代码。所以这里有一些示例数据来突出我的问题。

data_list <- list(structure(list(local_date_time = "2010-01-05T13:30:00", 
    value = -9999, data_quality = list(structure(list(qualifierid = 19, 
        qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T14:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T14:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T15:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T15:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T16:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T16:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T17:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T17:30:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T18:00:00", value = -9999, 
    data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8", 
        valid = FALSE), .Names = c("qualifierid", "qualifier_description", 
    "valid")))), .Names = c("local_date_time", "value", "data_quality")))

最简单的方法当然是rbind列表。 data.table的{​​{1}}在较大的列表中速度很快,如此:

rbindlist

但这会返回:

library(data.table)
rbindlist(data_list)

这是不理想的,因为最后一列实际上是3个项目的嵌套列表。我可以使用 local_date_time value data_quality 1: 2010-01-05T13:30:00 -9999 <list> 2: 2010-01-05T14:00:00 -9999 <list> 3: 2010-01-05T14:30:00 -9999 <list> 4: 2010-01-05T15:00:00 -9999 <list> 5: 2010-01-05T15:30:00 -9999 <list> 6: 2010-01-05T16:00:00 -9999 <list> 7: 2010-01-05T16:30:00 -9999 <list> 8: 2010-01-05T17:00:00 -9999 <list> 9: 2010-01-05T17:30:00 -9999 <list> 10: 2010-01-05T18:00:00 -9999 <list>

执行此操作
plyr

这很好用。有没有办法将此方法推广到可能具有不同格式的嵌套列表的列表?如果列表是单个级别,则应该使用简单的library(plyr) result <- ldply(data_list, function(x) { cbind(data.frame(t(unlist(x[1:2]))), data.frame(t(unlist(x[3])))) }) 。在这种情况下,我知道第3个元素有一个子列表。但我经常不知道。为每个人编写自定义包装器会有点单调乏味。

2 个答案:

答案 0 :(得分:2)

我遇到了一个名为LinearizeNestedList的函数Akhil S Bhel(有时候是在SO上)。它“扁平化”嵌套列表。

在您的情况下,您希望“展平”子列表,而不是主列表本身。

也许它可以在你的情况下使用如下:

library(devtools)
source_gist("https://gist.github.com/mrdwab/4205477")
# Sourcing https://gist.github.com/mrdwab/4205477/raw/1bd86c697b89de9941834882f1085c8312076e38/LinearizeNestedList.R
# SHA-1 hash of file is dde479195258dbad9367274ceedbd5a68251478a
x <- do.call(rbind.data.frame, lapply(data_list, LinearizeNestedList))
x
#        local_date_time value data_quality.1.qualifierid
# 2  2010-01-05T13:30:00 -9999                         19
# 21 2010-01-05T14:00:00 -9999                         19
# 3  2010-01-05T14:30:00 -9999                         19
# 4  2010-01-05T15:00:00 -9999                         19
# 5  2010-01-05T15:30:00 -9999                         19
# 6  2010-01-05T16:00:00 -9999                         19
# 7  2010-01-05T16:30:00 -9999                         19
# 8  2010-01-05T17:00:00 -9999                         19
# 9  2010-01-05T17:30:00 -9999                         19
# 10 2010-01-05T18:00:00 -9999                         19
#             data_quality.1.qualifier_description data_quality.1.valid
# 2  Passed sanity check; see incident report IR_8                FALSE
# 21 Passed sanity check; see incident report IR_8                FALSE
# 3  Passed sanity check; see incident report IR_8                FALSE
# 4  Passed sanity check; see incident report IR_8                FALSE
# 5  Passed sanity check; see incident report IR_8                FALSE
# 6  Passed sanity check; see incident report IR_8                FALSE
# 7  Passed sanity check; see incident report IR_8                FALSE
# 8  Passed sanity check; see incident report IR_8                FALSE
# 9  Passed sanity check; see incident report IR_8                FALSE
# 10 Passed sanity check; see incident report IR_8                FALSE

答案 1 :(得分:0)

一个简单的lapply as.data.frame也会这样做,至少只要你只有一个嵌套级别:

> res <- do.call(rbind, lapply(data_list, as.data.frame))
> str(res)
'data.frame':   10 obs. of  5 variables:
 $ local_date_time                   : Factor w/ 10 levels "2010-01-05T13:30:00",..: 1 2 3 4 5 6 7 8 9 10
 $ value                             : num  -9999 -9999 -9999 -9999 -9999 ...
 $ data_quality.qualifierid          : num  19 19 19 19 19 19 19 19 19 19
 $ data_quality.qualifier_description: Factor w/ 1 level "Passed sanity check; see incident report IR_8": 1 1 1 1 1 1 1 1 1 1
 $ data_quality.valid                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...