我有一个数据集" df"
> df
A B C
1 tanu abc 10
2 tanu def 20
3 tanu ghi 15
4 tanu jkl 28
5 tanu mno 33
6 tanu pqr 46
7 tanu stu 83
8 tanu vwx 15
9 edu yz1 60
10 edu abc2 85
> group
[1] 3 2 3 2
我必须找到列的最大值" C"对于每个小组。每个组都是按列#34; A"包含来自vector" group"
的相应行数Group1:
tanu abc 10
tanu def 20
tanu ghi 15
Group2:
tanu jkl 28
tanu mno 33
Group3:
tanu pqr 46
tanu stu 83
tanu vwx 15
Group4:
edu yz1 60
edu abc2 85
我无法通过aggregate
或by
功能实现这一目标。我希望我的输出是
> out
A B C
tanu def 20
tanu mno 33
tanu stu 83
edu abc2 85
感谢任何帮助。 TIA。
答案 0 :(得分:3)
使用by
和which.max
的另一种基本R方式:
do.call(rbind,
by(df, list(rep(seq_along(group), group)), function(g) g[which.max(g$C),]))
# A B C
# 1 tanu def 20
# 2 tanu mno 33
# 3 tanu stu 83
# 4 edu abc2 85
答案 1 :(得分:1)
首先,我认为这是基于C
min
变量group
的最大值或B
列和library(data.table)
res <- setDT(df)[, list(B=B[min(group)], C=max(C)),
by=list(gr=rep(seq_along(group), group),A)][,gr:=NULL]
值。以下是基于此的解决方案。
res <- setDT(df)[df[, max(C)==C,
by=list(rep(seq_along(group), group), A)]$V1]
res
# A B C
#1: tanu def 20
#2: tanu mno 33
#3: tanu stu 83
#4: edu abc2 85
在看了@Matthew Plourde的解决方案后,很明显我错了(在这个例子中,两者产生相同的结果)。在那种情况下,
dplyr
或使用 library(dplyr)
df %>%
group_by(gr=rep(seq_along(group), group), A) %>%
filter(C==max(C))%>%
ungroup() %>%
select(-gr)
# A B C
#1 tanu def 20
#2 tanu mno 33
#3 tanu stu 83
#4 edu abc2 85
df <- structure(list(A = c("tanu", "tanu", "tanu", "tanu", "tanu",
"tanu", "tanu", "tanu", "edu", "edu"), B = c("abc", "def", "ghi",
"jkl", "mno", "pqr", "stu", "vwx", "yz1", "abc2"), C = c(10L,
20L, 15L, 28L, 33L, 46L, 83L, 15L, 60L, 85L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))
{{1}}
答案 2 :(得分:1)
我认为这也会做到。
s <- sapply(split(df$C, rep.int(seq_along(group), group)), which.max)
df[s + cumsum(c(0, group[-length(group)])), ]
# A B C
# 2 tanu def 20
# 5 tanu mno 33
# 7 tanu stu 83
# 10 edu abc2 85
答案 3 :(得分:0)
这可能不是最清楚的答案,但它有效:)
A = c("tanu",
"tanu",
"tanu",
"tanu",
"tanu",
"tanu",
"tanu",
"tanu",
"edu",
"edu")
B = c("abc",
"def",
"ghi",
"jkl",
"mno",
"pqr",
"stu",
"vwx",
"yz1",
"abc2")
C = c(10,20,15,28,33,46,83,15,60,85)
df = data.frame(A=A, B=B, C=C)
group = c(3,2,3,2)
out = NULL
line.nb = 1
for(i in 1:length(group)){
beg = line.nb
end = line.nb + group[i]-1
temp = df[beg:end,]
res = temp[which(temp[,"C"] ==max(temp[,"C"])), ]
out = rbind(out,res)
line.nb = line.nb+group[i]
}
out