使用R中的列名称查找前N个最高值

时间:2016-06-13 12:31:13

标签: r

以下是我在分析中使用的sample数据。我需要做的是使用列名为每个行提取前3个值。例如,这将是前3行的输出:

id, group1, weight1, group2, weight2, group3, weight3
1, V4, 0.277991043, V10, 0.050863724, V2, 0.033589251
2, V5, 0.164107486, V4, 0.119961612, V3, 0.098208573
3, V3, 0.124760077, V5, 0.089891235, V2, 0.071337172

最简单的方法是什么?

2 个答案:

答案 0 :(得分:2)

这是另一种能够使数据保持整洁格式的想法:

library(dplyr)
library(tidyr)

sample %>%
  gather(key, value, -node) %>%
  group_by(node) %>%
  top_n(3) %>%
  # here we use arrange() to sort by node and value
  arrange(node, desc(value))

给出了:

#Source: local data frame [75 x 3]
#Groups: node [25]
#
#    node   key      value
#   <int> <chr>      <dbl>
#1      1    V4 0.27799104
#2      1   V10 0.05086372
#3      1    V2 0.03358925
#4      2    V5 0.16410749
#5      2    V4 0.11996161
#6      2    V3 0.09820857
#7      3    V3 0.12476008
#8      3    V5 0.08989123
#9      3    V2 0.07133717
#10     4    V6 0.20665387
#..   ...   ...        ...

如果你真的想要达到你想要的输出,你可以这样做:

sample %>%
  gather(key, value, -node) %>%
  group_by(node) %>%
  top_n(3) %>%
  arrange(node, desc(value)) %>%
  mutate(group  = paste0("group", row_number()),
         weight = paste0("weight", row_number())) %>%
  spread(group, value) %>%
  spread(weight, key) %>%
  summarise_each(funs(max(., na.rm = TRUE)))

给出了:

#Source: local data frame [25 x 7]
#
#    node    group1     group2      group3 weight1 weight2 weight3
#   <int>     <dbl>      <dbl>       <dbl>   <chr>   <chr>   <chr>
#1      1 0.2779910 0.05086372 0.033589251      V4     V10      V2
#2      2 0.1641075 0.11996161 0.098208573      V5      V4      V3
#3      3 0.1247601 0.08989123 0.071337172      V3      V5      V2
#4      4 0.2066539 0.14747281 0.121561100      V6      V2     V10
#5      5 0.2773512 0.21849008 0.158989123      V1      V8      V3
#6      6 0.1509917 0.11964171 0.117722329      V9      V3     V10
#7      7 0.2415227 0.13595649 0.130838132      V9      V7      V8
#8      8 0.1090851 0.10588612 0.088611644      V9      V7      V5
#9      9 0.1868202 0.11548305 0.089571337     V10      V1      V6
#10    10 0.3429303 0.12955854 0.003838772      V5      V6     V11
#..   ...       ...        ...         ...     ...     ...     ...

答案 1 :(得分:0)

我们可以使用apply

res <- cbind(df1[1], t(apply(df1[-1], 1, function(x) {
         i1 <- order(-x)
          c(rbind(names(df1)[-1][i1][1:3], x[i1][1:3]))}
        )))

然后,我们可以进行类型转换

res[] <- lapply(res, function(x) {x1 <- type.convert(as.character(x))
               if(is.factor(x1)) as.character(x1) else x1})
names(res)[-1] <- make.unique(rep(c("group", "weight"), (ncol(res)-1)/2))