如何在所有列中获取常用值

时间:2017-11-20 10:10:44

标签: r

我想知道哪些值在N列,N-1列,N-2列等中是常见的。

输入

structure(c("a", "b", "c", "d", "e", "f", "a", "z", "d", "b", 
   "e", "s", "a", "b", "c", "d", "e", "s", "a", "b", "c", "d", "e", 
  "f"), .Dim = c(6L, 4L), .Dimnames = list(NULL, c("x", "y", "z", 
  "a")))

输出

common in all 4 columns :- a , b, e ,d

common in maximum 3 columns :- c

common in maximum 2 columns:- f,s

1 个答案:

答案 0 :(得分:0)

从宽格式到长格式重塑给定矩阵(melt()有一个矩阵方法)并按值计算可能是一种方法:

library(data.table)
options(datatable.print.class = TRUE)
setDT(melt(dat))[, .N, by = "value"][order(-N)]
    value     N
   <fctr> <int>
1:      a     4
2:      b     4
3:      d     4
4:      e     4
5:      c     3
6:      f     2
7:      s     2
8:      z     1

但是,需要增强代码以处理每列中的重复项(dat2重复第1行):

setDT(melt(dat2))[, unique(value), by = Var2][, .N, by = "V1"][order(-N)]
       V1     N
   <fctr> <int>
1:      a     4
2:      b     4
3:      d     4
4:      e     4
5:      c     3
6:      f     2
7:      s     2
8:      z     1

或更确切地说:

setDT(melt(dat2))[, unique(value), by = Var2][, .N, by = "V1"][
  , toString(sort(V1)), by = N][order(-N)]
       N         V1
   <int>     <char>
1:     4 a, b, d, e
2:     3          c
3:     2       f, s
4:     1          z

N表示值出现的列数。

数据

dat <- structure(
  c("a", "b", "c", "d", "e", "f", "a", "z", "d", "b", "e", "s", 
    "a", "b", "c", "d", "e", "s", "a", "b", "c", "d", "e", "f"), 
  .Dim = c(6L, 4L), 
  .Dimnames = list(NULL, c("x", "y", "z", "a")))

# second data set with duplicated row 1
dat2 <- dat[c(1, seq_len(nrow(dat))), ]

dat2
     x   y   z   a  
[1,] "a" "a" "a" "a"
[2,] "a" "a" "a" "a"
[3,] "b" "z" "b" "b"
[4,] "c" "d" "c" "c"
[5,] "d" "b" "d" "d"
[6,] "e" "e" "e" "e"
[7,] "f" "s" "s" "f"