第一个差异数据框架

时间:2018-01-11 16:02:15

标签: r

我有以下数据框:

>dados

COUNTRY   Year   CO2 emissions Pop. Growth(%)
Argentina  1994      1.23         0.3
Argentina  1995      1.26         0.2
Argentina  1996      1.28         0.4
Argentina  1997      1.24         0.2
Brazil     1994      1.54         0.7
Brazil     1995      1.59         0.6
Brazil     1996      1.60         0.9
Brazil     1997      1.58         1.3

我想首先区分每个国家/地区的变量CO2 emissionsPop. Growth(%)。我已经尝试了函数dados[,2:4] <- diff(dados[,2:4]),但它返回了错误:

  

“r [i1]中的错误 - r [-length(r): - (length(r) - lag + 1L)]:非数字   二元运算符的参数“

1 个答案:

答案 0 :(得分:1)

以下是dplyr

library(dplyr)

df %>%
  group_by(COUNTRY) %>%
  mutate_at(vars(CO2_emissions:Pop_Growth), funs(.-lag(.)))

<强>结果:

# A tibble: 8 x 4
# Groups:   COUNTRY [2]
    COUNTRY  Year CO2_emissions Pop_Growth
     <fctr> <int>         <dbl>      <dbl>
1 Argentina  1994            NA         NA
2 Argentina  1995          0.03       -0.1
3 Argentina  1996          0.02        0.2
4 Argentina  1997         -0.04       -0.2
5    Brazil  1994            NA         NA
6    Brazil  1995          0.05       -0.1
7    Brazil  1996          0.01        0.3
8    Brazil  1997         -0.02        0.4

数据:

df = structure(list(COUNTRY = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), .Label = c("Argentina", "Brazil"), class = "factor"), 
    Year = c(1994L, 1995L, 1996L, 1997L, 1994L, 1995L, 1996L, 
    1997L), CO2_emissions = c(1.23, 1.26, 1.28, 1.24, 1.54, 1.59, 
    1.6, 1.58), Pop_Growth = c(0.3, 0.2, 0.4, 0.2, 0.7, 0.6, 
    0.9, 1.3)), .Names = c("COUNTRY", "Year", "CO2_emissions", 
"Pop_Growth"), class = "data.frame", row.names = c(NA, -8L))