在列上聚合数据框

时间:2015-04-16 16:18:33

标签: r dataframe aggregate

我有一个数据框,其中一列代表年份。我们说

region <- c("Spain", "Italy", "Norway")
year   <- c("2010","2011","2012","2010","2011","2012","2010","2011","2012")
m1     <- c("10","11","12","13","14","15","16","17","18")
m2     <- c("20","30","40","50","60","70","80","90","100")
data   <- data.frame(region,year,m1,m2)

我希望以每个国家/地区的3年平均值的方式汇总数据集m1。我对如何使用数据框这样做很困惑。任何评论都非常感谢。 提前致谢!

1 个答案:

答案 0 :(得分:1)

首先,您的m1变量需要是数字。使用as.numeric()转换它:

data$m1 <- as.numeric(as.character(data$m1))

然后,您可以像这样使用aggregate

aggregate(m1 ~ region, FUN = mean, data = data)

#   region m1
# 1  Italy 14
# 2 Norway 15
# 3  Spain 13

为避免尴尬的类型转换(as.numeric(as.character())),您应该删除m1m2的设置中的引号:

m1     <- c(10,11,12,13,14,15,16,17,18)
m2     <- c(20,30,40,50,60,70,80,90,100)

使用dplyr的替代方法:

library(dplyr)

region <- c("Spain", "Italy", "Norway")
year   <- c("2010","2011","2012","2010","2011","2012","2010","2011","2012")
m1     <- c(10,11,12,13,14,15,16,17,18)
m2     <- c(20,30,40,50,60,70,80,90,100)
data   <- data.frame(region,year,m1,m2)

data %>%
  group_by(region) %>%
  summarise(mean_m1 = mean(m1),
            mean_m2 = mean(m2))

#   region mean_m1 mean_m2
# 1  Italy      14      60
# 2 Norway      15      70
# 3  Spain      13      50