Question

例如，我有一个包含许多列和行的数据框

id  column1 column2 column3
1   2   3   5
2   3   2   6
3   4   1   3
4   1   1   2
5   3   3   2
6   5   2   1

如何选择最大值大于特定值（例如示例数据中的5）的列（id除外）？

因此选择数据应为：

id  column1 column3
1   2   5
2   3   6
3   4   3
4   1   2
5   3   2
6   5   1

我很乐意为您提供帮助。非常感谢你！

Answer 1

这需要首先找到那些最大值，然后相应地对数据帧进行子集设置，如

df[c(TRUE, apply(df[-1], 2, max) >= 5)]
#   id column1 column3
# 1  1       2       5
# 2  2       3       6
# 3  3       4       3
# 4  4       1       2
# 5  5       3       2
# 6  6       5       1

其中

apply(df[-1], 2, max)
# column1 column2 column3 
#       5       3       6

并添加TRUE也会保留id列。

Answer 2

执行此操作的多种方法。

使用基数R

cbind(df[1], df[-1][sapply(df[-1], function(x) any(x >=5))])

#  id column1 column3
#1  1       2       5
#2  2       3       6
#3  3       4       3
#4  4       1       2
#5  5       3       2
#6  6       5       1

在与colSums比较之后，我们也可以在逻辑矩阵上使用>= 5

cbind(df[1], df[-1][colSums(df[-1] >= 5)  > 0])

或与Filter

cbind(df[1], Filter(function(x) any(x >= 5), df[-1]))

或使用dplyr

library(dplyr)

bind_cols(df[1], df %>%
                 select(-1) %>%
                 select_if(~any(. >=5)))

如何根据每列的最大值选择列？

2 个答案: