基于列的相同因子值对数据帧进行排序

时间:2016-01-22 18:38:46

标签: r

我有一个跨多个列具有相同因子值的数据帧,我想根据每行一列中的值的数量对数据进行排序/子集化。

df <- data.frame(a = factor(c("yes", "yes", "no", "maybe"), 
levels = c("yes", "no", "maybe")), b = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), c = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), d = c(1,2,3,4))

df
      a     b     c d
1   yes maybe maybe 1
2   yes   yes   yes 2
3    no   yes   yes 3
4 maybe    no    no 4

我想根据每行所有列上出现“是”的次数对数据进行排序/子集化。因此,将“是”出现2次或更多次(df2)的所有行进行子集,然后(不太重要)根据此排序,其中具有最多“是”值的行位于顶部。如果保留原始行号并不重要。

df2
      a     b     c d
2   yes   yes   yes 2
3    no   yes   yes 3

df
      a     b     c d
2   yes   yes   yes 2
3    no   yes   yes 3
1   yes maybe maybe 1
4 maybe    no    no 4

我考虑过使用order()函数:

df[order(df$a,df$b,df$c), ]

但这并不能归还我想要的东西。我想我需要使用lapply(),但我不确定要使用什么功能。

1 个答案:

答案 0 :(得分:4)

我们可以使用rowSums

df <- data.frame(a = factor(c("yes", "yes", "no", "maybe"), 
levels = c("yes", "no", "maybe")), b = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), c = factor(c("maybe", "yes", "yes", "no"), 
levels = c("yes", "no", "maybe")), d = c(1,2,3,4))

df2 <- df[rowSums(df == "yes") >= 2, ]

df2
#     a   b   c d Count
# 2 yes yes yes 2     3
# 3  no yes yes 3     2

这会处理过滤方法。但是,如果我们还希望按那些具有最多“是”值的那些排序,我们可以先将其设置为数据中的列,然后进行过滤和排序,然后删除列

df$Count <- rowSums(df == "yes")
df <- df[df$Count >= 2, ]
df <- df[order(df$Count, decreasing = TRUE), ]
df <- subset(df, select = -c(Count))
df
#     a   b   c d
# 2 yes yes yes 2
# 3  no yes yes 3
相关问题