我有一个跨多个列具有相同因子值的数据帧,我想根据每行一列中的值的数量对数据进行排序/子集化。
df <- data.frame(a = factor(c("yes", "yes", "no", "maybe"),
levels = c("yes", "no", "maybe")), b = factor(c("maybe", "yes", "yes", "no"),
levels = c("yes", "no", "maybe")), c = factor(c("maybe", "yes", "yes", "no"),
levels = c("yes", "no", "maybe")), d = c(1,2,3,4))
df
a b c d
1 yes maybe maybe 1
2 yes yes yes 2
3 no yes yes 3
4 maybe no no 4
我想根据每行所有列上出现“是”的次数对数据进行排序/子集化。因此,将“是”出现2次或更多次(df2
)的所有行进行子集,然后(不太重要)根据此排序,其中具有最多“是”值的行位于顶部。如果保留原始行号并不重要。
df2
a b c d
2 yes yes yes 2
3 no yes yes 3
df
a b c d
2 yes yes yes 2
3 no yes yes 3
1 yes maybe maybe 1
4 maybe no no 4
我考虑过使用order()
函数:
df[order(df$a,df$b,df$c), ]
但这并不能归还我想要的东西。我想我需要使用lapply()
,但我不确定要使用什么功能。
答案 0 :(得分:4)
我们可以使用rowSums
。
df <- data.frame(a = factor(c("yes", "yes", "no", "maybe"),
levels = c("yes", "no", "maybe")), b = factor(c("maybe", "yes", "yes", "no"),
levels = c("yes", "no", "maybe")), c = factor(c("maybe", "yes", "yes", "no"),
levels = c("yes", "no", "maybe")), d = c(1,2,3,4))
df2 <- df[rowSums(df == "yes") >= 2, ]
df2
# a b c d Count
# 2 yes yes yes 2 3
# 3 no yes yes 3 2
这会处理过滤方法。但是,如果我们还希望按那些具有最多“是”值的那些排序,我们可以先将其设置为数据中的列,然后进行过滤和排序,然后删除列
df$Count <- rowSums(df == "yes")
df <- df[df$Count >= 2, ]
df <- df[order(df$Count, decreasing = TRUE), ]
df <- subset(df, select = -c(Count))
df
# a b c d
# 2 yes yes yes 2
# 3 no yes yes 3