我有一个看起来像这样的data.frame:
Geotype <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)
Strategy <- c("Demand", "Strategy 1", "Strategy 2", "Strategy 3", "Strategy 4", "Strategy 5", "Strategy 6")
Year.1 <- c(1:21)
Year.2 <- c(1:21)
Year.3 <- c(1:21)
Year.4 <- c(1:21)
mydata <- data.frame(Geotype,Strategy,Year.1, Year.2, Year.3, Year.4)
我想对每年的每项策略进行总结。
这意味着我需要在数据框中的每一列下面加6行,然后跳过Demand行。然后我想对所有专栏(40年)重复这一点。
我希望输出数据框看起来像这样:
Geotype.output <- c(1, 2, 3)
Year.1.output <- c(27, 69, 111)
Year.2.output <- c(27, 69, 111)
Year.3.output <- c(27, 69, 111)
Year.4.output <- c(27, 69, 111)
output <- data.frame(Geotype.output,Year.1.output, Year.2.output, Year.3.output, Year.4.output)
有关如何优雅地做到这一点的任何建议?我尝试使用this,this和this一起破解解决方案,但我没有成功,因为我需要跳过一行。
答案 0 :(得分:6)
您可以尝试使用base R
aggregate
函数(按Geotype
汇总数据,使用函数sum
作为“唯一值”),但使用简化的data.frame(没有“需求”行和Strategy
列):
aggregate(.~Geotype, data=mydata[mydata$Strategy !="Demand", -2], FUN=sum)
# Geotype Year.1 Year.2 Year.3 Year.4
#1 1 27 27 27 27
#2 2 69 69 69 69
#3 3 111 111 111 111
答案 1 :(得分:5)
使用data.table:
library(data.table)
setDT(mydata)
output = mydata[Strategy != "Demand",
.(Year.1.output = sum (Year.1),
Year.2.output = sum (Year.2),
Year.3.output = sum (Year.3),
Year.4.output = sum (Year.4)),
by = Geotype]
# Geotype Year.1.output Year.2.output Year.3.output Year.4.output
# 1: 1 27 27 27 27
# 2: 2 69 69 69 69
# 3: 3 111 111 111 111
我们可以通过
简化这一过程,以便更轻松地处理多年的专栏setDT(mydata)[Strategy != "Demand",
lapply(.SD, sum),
by=Geotype,
.SDcols=grep("Year", names(mydata))]
答案 2 :(得分:3)
我更喜欢以长格式获取数据:
library(dplyr)
library(tidyr)
library(reshape2)
mydata %>% gather(key, value, - Geotype, - Strategy) %>%
filter(Strategy!="Demand") %>% group_by(Geotype, key) %>%
summarize(sum = sum(value))
结果:
Geotype key sum
<dbl> <chr> <int>
1 1 Year.1 27
2 1 Year.2 27
3 1 Year.3 27
4 1 Year.4 27
5 2 Year.1 69
6 2 Year.2 69
7 2 Year.3 69
8 2 Year.4 69
9 3 Year.1 111
10 3 Year.2 111
11 3 Year.3 111
12 3 Year.4 111
使用传播:
mydata %>% gather(key, value, - Geotype, - Strategy) %>%
filter(Strategy!="Demand") %>% group_by(Geotype, key) %>%
summarize(sum = sum(value)) %>% spread(key, sum)
产量
Geotype Year.1 Year.2 Year.3 Year.4
* <dbl> <int> <int> <int> <int>
1 1 27 27 27 27
2 2 69 69 69 69
3 3 111 111 111 111
答案 3 :(得分:0)
我的声誉太低而无法发表评论,但您可以使用dplyr和summarize_each。
mydata %>% dplyr::filter(Strategy!="Demand") %>% group_by(Geotype) %>% summarize_each(funs(sum), contains("Year"))