按特定列中最常见的值对数据框进行排序

时间:2019-07-15 15:12:11

标签: r

我在R中有一个数据帧,这是它的一部分:

<div class="cycle-words" data-words="yes,no,maybe"></div>

<div class="cycle-words" data-words="hello,hi,hey"></div>

我想要实现的是按照第二列中最常见的元素(出现次数更多)对数据帧进行排序,这是理想的结果:

Kif21a PTHR24115 ENSMUSG00000022629
Acss3 PTHR24115 ENSMUSG00000035948
Nr1h4 PTHR24082 ENSMUSG00000047638
Rarg PTHR24082 ENSMUSG00000001288
Vdr PTHR24082 ENSMUSG00000022479
Pamr1 PTHR24254 ENSMUSG00000027188

非常感谢!

4 个答案:

答案 0 :(得分:1)

一个选项是

library(dplyr)
df1 %>%
   group_by(col2) %>%
   mutate(n = n()) %>%
   ungroup %>%
   arrange(desc(n))

另一个选项是add_count

df1 %>%
  add_count(col2) %>%
  arrange(desc(n))
# A tibble: 6 x 4
#  col1   col2      col3                   n
#  <chr>  <chr>     <chr>              <int>
#1 Nr1h4  PTHR24082 ENSMUSG00000047638     3
#2 Rarg   PTHR24082 ENSMUSG00000001288     3
#3 Vdr    PTHR24082 ENSMUSG00000022479     3
#4 Kif21a PTHR24115 ENSMUSG00000022629     2
#5 Acss3  PTHR24115 ENSMUSG00000035948     2
#6 Pamr1  PTHR24254 ENSMUSG00000027188     1

或将base Rave一起使用

df1[with(df1, order(-ave(seq_along(col2), col2, FUN = length))),]

数据

df1 <- structure(list(col1 = c("Kif21a", "Acss3", "Nr1h4", "Rarg", "Vdr", 
"Pamr1"), col2 = c("PTHR24115", "PTHR24115", "PTHR24082", "PTHR24082", 
"PTHR24082", "PTHR24254"), col3 = c("ENSMUSG00000022629", "ENSMUSG00000035948", 
"ENSMUSG00000047638", "ENSMUSG00000001288", "ENSMUSG00000022479", 
"ENSMUSG00000027188")), class = "data.frame", row.names = c(NA, 
-6L))

答案 1 :(得分:1)

如果您的列名为A,B,C,则可以使用以下代码。这会将N列添加到df中,因此,如果您不希望这样做,可以在开始时添加df <-以使此输出覆盖df,或替换{ {1}}与df

copy(df)

答案 2 :(得分:0)

使用基础:

df <-as.data.frame(matrix(c("Kif21a", "PTHR24115", "ENSMUSG00000022629",
"Acss3", "PTHR24115", "ENSMUSG00000035948",
"Nr1h4", "PTHR24082", "ENSMUSG00000047638",
"Rarg", "PTHR24082", "ENSMUSG00000001288",
"Vdr", "PTHR24082", "ENSMUSG00000022479",
"Pamr1", "PTHR24254", "ENSMUSG00000027188"),ncol =3, byrow = T))
      V1        V2                 V3
1 Kif21a PTHR24115 ENSMUSG00000022629
2  Acss3 PTHR24115 ENSMUSG00000035948
3  Nr1h4 PTHR24082 ENSMUSG00000047638
4   Rarg PTHR24082 ENSMUSG00000001288
5    Vdr PTHR24082 ENSMUSG00000022479
6  Pamr1 PTHR24254 ENSMUSG00000027188

tmp <- table(df$V2)
df[order(tmp[levels(df$V2)[df$V2]], decreasing = T),]
 V1        V2                 V3
3  Nr1h4 PTHR24082 ENSMUSG00000047638
4   Rarg PTHR24082 ENSMUSG00000001288
5    Vdr PTHR24082 ENSMUSG00000022479
1 Kif21a PTHR24115 ENSMUSG00000022629
2  Acss3 PTHR24115 ENSMUSG00000035948
6  Pamr1 PTHR24254 ENSMUSG00000027188

答案 3 :(得分:0)

R的基本方法是使用V2计算table的出现次数,sort以降序对其进行计数,使用stack和{{1}将其转换为数据帧}和原始数据框

merge

如果不需要,您可以删除merge(df, stack(sort(table(df$V2), decreasing = TRUE)), by.x = "V2", by.y = "ind") # V2 V1 V3 values #1 PTHR24082 Nr1h4 ENSMUSG00000047638 3 #2 PTHR24082 Rarg ENSMUSG00000001288 3 #3 PTHR24082 Vdr ENSMUSG00000022479 3 #4 PTHR24115 Kif21a ENSMUSG00000022629 2 #5 PTHR24115 Acss3 ENSMUSG00000035948 2 #6 PTHR24254 Pamr1 ENSMUSG00000027188 1 列,该列是每个values的频率计数。

V2中,我们可以使用dplyr

inner_join