将数据框列的嵌套列表拆分为不同的列

时间:2017-06-13 11:00:31

标签: r list dataframe split

我尝试了相关的解决方案,但它们不适合我的情况。我有一个数据框,在一列中有一个嵌套列表,我想拆分这个列表并将其放在列中。列表包含另一个列表,其中包含每个月的时间戳(ts)和每个月的消耗量(v)。数据框是:

   id      monthly_consum
1 112          list1
2  34          list2
3  54          list3

其中

list1<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 466.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
                         list(ts = "2016-03-01T00:00:00+01:00", v = 765.6),list(ts = "2016-04-01T00:00:00+01:00", v = 888.6),
                         list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
                         list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
                         list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
                         list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016- 12-01T00:00:00+01:00", v = 555))


list2<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 333.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
              list(ts = "2016-03-01T00:00:00+01:00", v = 765.6),list(ts = "2016-04-01T00:00:00+01:00", v = 333.6),
              list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
              list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
              list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
              list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016-12-01T00:00:00+01:00", v = 555))


list3<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 323.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
           list(ts = "2016-03-01T00:00:00+01:00", v = 333.6),list(ts = "2016-04-01T00:00:00+01:00", v = 888.6),
           list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
           list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
           list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
           list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016-12-01T00:00:00+01:00", v = 555))

我想拆分列表并创建一个数据帧,该数据帧将具有以下两种格式之一:

   id          ts.1                     cons.1    ts.2    cons.2  ts.3 etc..
1 112      2016-01-01T00:00:00+01:00    466.6    2016-02..   ...   ...
2  34      2016-01-01T00:00:00+01:00    333.6    2016-02..   ...   ...
3  54      2016-01-01T00:00:00+01:00    323.6    2016-02..   ...   ...

OR

  id             ts                  consumption    
 112      2016-01-01T00:00:00+01:00    466.6    
 112      2016-02-01T00:00:00+01:00    565.6    
 112      2016-03-01T00:00:00+01:00    765.6 
 112      2016-04-01T00:00:00+01:00    888.6    
 112      2016-05-01T00:00:00+01:00    465    
 112      2016-06-01T00:00:00+01:00    465.6 
 112      2016-07-01T00:00:00+01:00    786    
 112      2016-08-01T00:00:00+01:00    435    
 112      2016-09-01T00:00:00+01:00    568 
 112      2016-10-01T00:00:00+01:00    678    
 112      2016-11-01T00:00:00+01:00    522   
 112      2016-12-01T00:00:00+01:00    555 
 34       2016-01-01T00:00:00+01:00    466.6    
 34       2016-02-01T00:00:00+01:00    333.6    
 34       2016-03-01T00:00:00+01:00    323.6 
 etc............
你可以帮帮我吗?我正在使用data.frame(matrix(unlist ..))但它没有给出我想要的格式。当我使用rbind列表时,我得到:

  

&#34; rbindlist(....)出错:     列表输入的第1项不是data.frame,data.table或list&#34;

提前谢谢!

更新 使用dput我会得到(在真正的问题中):

 >dput(locs_total[9:12,1:5])
     structure(list(X.dep_id. = c("34", "34", "34", "34"), X.loc_id. = c("17761", 
    "17406", "23591", "27838"), X.surface. = c("200", "1250", "54", 
    "150"), X.sector. = c("HOUSING", "SMALL-STORE-FOOD", "LIBRARY", 
    "OFFICE-BUILDING"), 
 X.avg_cons_main. = list(list(structure(list(
        ts = "2016-01-01T00:00:00+01:00", v = 466.65), .Names = c("ts", 
    "v")), structure(list(ts = "2016-02-01T00:00:00+01:00", v = 406.45), 
   .Names = c("ts", 
    "v")), structure(list(ts = "2016-03-01T00:00:00+01:00", v = 483.35), 
   .Names = c("ts", 
   "v")), structure(list(ts = "2016-04-01T00:00:00+02:00", v = 79.45), . 
   Names = c("ts", 
  "v"))), NULL, NULL, NULL)), .Names = c("X.dep_id.", "X.loc_id.", 
  "X.surface.", "X.sector.", "X.avg_cons_main."
 ), row.names = c("9", "10", "11", "12"), class = "data.frame")

2 个答案:

答案 0 :(得分:0)

我们可以遍历list

res <- do.call(rbind, Map(cbind, id = df1$id, lapply(mget(df1$monthly_consum), 
                   function(x) do.call(rbind.data.frame, x))))
names(res)[3] <- "consumption"
row.names(res) <- NULL
head(res, 14)
#    id                         ts consumption
#1  112  2016-01-01T00:00:00+01:00       466.6
#2  112  2016-02-01T00:00:00+01:00       565.6
#3  112  2016-03-01T00:00:00+01:00       765.6
#4  112  2016-04-01T00:00:00+01:00       888.6
#5  112  2016-05-01T00:00:00+01:00       465.0
#6  112  2016-06-01T00:00:00+01:00       465.6
#7  112  2016-07-01T00:00:00+01:00       786.0
#8  112  2016-08-01T00:00:00+01:00       435.0
#9  112  2016-09-01T00:00:00+01:00       568.0
#10 112  2016-10-01T00:00:00+01:00       678.0
#11 112  2016-11-01T00:00:00+01:00       522.0
#12 112 2016- 12-01T00:00:00+01:00       555.0
#13  34  2016-01-01T00:00:00+01:00       333.6
#14  34  2016-02-01T00:00:00+01:00       565.6

数据

df1 <- structure(list(id = c(112L, 34L, 54L), monthly_consum = c("list1", 
"list2", "list3")), .Names = c("id", "monthly_consum"), 
class = "data.frame", row.names = c("1", "2", "3"))

答案 1 :(得分:0)

如果ID也在列表中,您可以使用dplyr::bind_rows

dplyr::bind_rows(list1, list2, list3)
# A tibble: 36 × 2
                          ts     v
                       <chr> <dbl>
1  2016-01-01T00:00:00+01:00 466.6
2  2016-02-01T00:00:00+01:00 565.6
3  2016-03-01T00:00:00+01:00 765.6
4  2016-04-01T00:00:00+01:00 888.6
5  2016-05-01T00:00:00+01:00 465.0
6  2016-06-01T00:00:00+01:00 465.6
7  2016-07-01T00:00:00+01:00 786.0
8  2016-08-01T00:00:00+01:00 435.0
9  2016-09-01T00:00:00+01:00 568.0
10 2016-10-01T00:00:00+01:00 678.0
# ... with 26 more rows

从其他df添加ID

library(dplyr)

ids <- data_frame(list_id = c(112, 34, 54),
                  monthly_consum = c("list1", "list2", "list3"))

如果我们考虑嵌套列表,您可以使用purrr:map,如下所示:

- 将三个列表合并到一个列表中

k <- list(list1, list2, list3)

- 使用map独立地映射到每列中的bind_rows

k1 <- purrr:: map(k, bind_rows) 

- 使用ID作为列表的名称

names(k1) <- ids$list_id

-bind_rows使用.id

bind_rows(k1, .id = "id")

# A tibble: 36 × 3
      id                        ts     v
   <chr>                     <chr> <dbl>
1    112 2016-01-01T00:00:00+01:00 466.6
2    112 2016-02-01T00:00:00+01:00 565.6
3    112 2016-03-01T00:00:00+01:00 765.6
4    112 2016-04-01T00:00:00+01:00 888.6
5    112 2016-05-01T00:00:00+01:00 465.0
6    112 2016-06-01T00:00:00+01:00 465.6
7    112 2016-07-01T00:00:00+01:00 786.0
8    112 2016-08-01T00:00:00+01:00 435.0
9    112 2016-09-01T00:00:00+01:00 568.0
10   112 2016-10-01T00:00:00+01:00 678.0