从多个变数到无平衡的时间,从宽到长整形

时间:2019-05-17 18:17:42

标签: r dataframe reshape

我有两个变量组:年百分比和年。年度百分比从1999年开始,到2012年结束,但是年份从1999年到2013年开始。

countrylabel annualpercentageshare.1999 year1990 year1991 year1992
1      Austria                         NA       NA       NA       NA
2      Belgium                         NA       NA       NA       NA
3     Bulgaria                   48.20000       NA       NA       NA
4      Estonia                         NA       NA       NA       NA
5       France                   47.52853       NA       NA       NA
6      Germany                         NA       NA       NA       NA

这样的事情。

我已经尝试了以下代码:

merge_data2 <- reshape(merge_data2, varying = list(2:ncol(merge_data2)), 
                       v.names = c("percentageshare", "Year"),
                       idvar = "countrylabel", direction = "long", times = 1990:2013)

但我收到此错误消息:

  

“ reshapeLong(数据,idvar = idvar,timevar = timevar,变化=变化,错误:     'lengths(varying)'必须全部匹配'length(times)'“

编辑:我想要一个这样的数据框:

countrylabel    time      annualpercentageshare        year
Austria          1990            NA                      NA
Austria          1991            NA                      NA

2 个答案:

答案 0 :(得分:0)

library(tidyr); library(dplyr)
df %>%
  gather(variable, value, -countrylabel) %>%
  separate("variable", into = c("stat", "time"), sep = -4) %>%
  spread(stat, value)

输出

   countrylabel time annualpercentageshare. year
1       Austria 1990                     NA   NA
2       Austria 1991                     NA   NA
3       Austria 1992                     NA   NA
4       Austria 1999                     NA   NA
5       Belgium 1990                     NA   NA
6       Belgium 1991                     NA   NA
7       Belgium 1992                     NA   NA
8       Belgium 1999                     NA   NA
9      Bulgaria 1990                     NA   NA
10     Bulgaria 1991                     NA   NA
11     Bulgaria 1992                     NA   NA
12     Bulgaria 1999               48.20000   NA
13      Estonia 1990                     NA   NA
14      Estonia 1991                     NA   NA
15      Estonia 1992                     NA   NA
16      Estonia 1999                     NA   NA
17       France 1990                     NA   NA
18       France 1991                     NA   NA
19       France 1992                     NA   NA
20       France 1999               47.52853   NA
21      Germany 1990                     NA   NA
22      Germany 1991                     NA   NA
23      Germany 1992                     NA   NA
24      Germany 1999                     NA   NA

答案 1 :(得分:0)

reshape喜欢".",所以我们首先将一个插入year*变量中。

names(d) <- gsub("year", "year.", names(d))

现在,我们给reshape缺少的列和order

d$annualpercentage.2002 <- NA
d$year.1999 <- NA
d <- d[c(1, order(names(d)[-1]) + 1)]

您的想法通过定义列表中varying中不同的列排序而起作用:

res <- reshape(d, varying=list(2:5, 6:9), direction="long", idvar="countrylabel", 
               times=1999:2002, v.names=c("annualpercentage", "year"))
res
#                  countrylabel time annualpercentage        year
# Austria.1999          Austria 1999               NA          NA
# Belgium.1999          Belgium 1999               NA          NA
# Bulgaria.1999        Bulgaria 1999       -0.6806495          NA
# Estonia.1999          Estonia 1999               NA          NA
# France.1999            France 1999               NA          NA
# Germany.1999          Germany 1999               NA          NA
# Switzerland.1999  Switzerland 1999       -1.8497570          NA
# Austria.2000          Austria 2000       -0.6033900  0.14970015
# Belgium.2000          Belgium 2000               NA -0.49201756
# Bulgaria.2000        Bulgaria 2000        0.8263925 -0.36320990
# Estonia.2000          Estonia 2000               NA -2.51032544
# France.2000            France 2000               NA  0.57800624
# Germany.2000          Germany 2000               NA -0.52295712
# Switzerland.2000  Switzerland 2000        0.2783076  0.25616728
# Austria.2001          Austria 2001       -2.6962484 -0.15375642
# Belgium.2001          Belgium 2001        1.3088577  0.72528621
# Bulgaria.2001        Bulgaria 2001               NA          NA
# Estonia.2001          Estonia 2001               NA -0.05563662
# France.2001            France 2001        0.2224629  0.74205086
# Germany.2001          Germany 2001               NA -0.01185349
# Switzerland.2001  Switzerland 2001        0.8354322 -1.40826638
# Austria.2002          Austria 2002               NA          NA
# Belgium.2002          Belgium 2002               NA  1.60874778
# Bulgaria.2002        Bulgaria 2002               NA          NA
# Estonia.2002          Estonia 2002               NA  0.55866704
# France.2002            France 2002               NA -1.59866472
# Germany.2002          Germany 2002               NA -0.11217415
# Switzerland.2002  Switzerland 2002               NA          NA

数据

d <- structure(list(countrylabel = c("Austria", "Belgium", "Bulgaria", 
"Estonia", "France", "Germany", "Switzerland"), annualpercentage.1999 = c(NA, 
-2.58060150400384, -0.0623757258909573, 0.267776001395166, NA, 
NA, 0.048219924249952), annualpercentage.2000 = c(NA, -0.249416955035044, 
1.3525450891501, 1.04446768824697, NA, -0.0582347596434839, -0.891400228849837
), annualpercentage.2001 = c(1.82469277697851, NA, NA, 1.04231605324821, 
NA, -0.900145118946308, -1.19320727433597), year2000 = c(0.633712375393134, 
NA, 1.24760861316098, -0.092964787061478, -0.59403260962332, 
NA, -0.650348234181285), year2001 = c(0.587318286831079, NA, 
NA, 0.348890470222513, NA, NA, NA), year2002 = c(0.0645316087966406, 
-0.279456557428068, NA, NA, -0.0627400036074545, 1.30419117694731, 
-0.484654596062051)), row.names = c(NA, -7L), class = "data.frame")
相关问题