条件矩阵:根据成对的列计算值

时间:2018-09-15 15:09:49

标签: r lapply

我有一个数据框 df 看起来像

  A1  A2  A3
1: 1   0   1
2: 1   1   0
3: 1   0   1

,并希望创建一个包含所有条件均值的矩阵,其中条件在另一列中为“ 1”。
例如:

  1. 对于以A1为条件的A3,应计算2/3 = 0.66(=> colsum(A3,在A1 = 1的行中)/ colsum(A1))
  2. 对于以A3为条件的A1,应计算2/2 = 1.0(=> colsum(A1,在A3 = 1的行中)/ colsum(A3))
  3. 对于以A1为条件的A2,应计算1/3 = 1.0(=> colsum(A2,在A1 = 1的行中))/ colsum(A1))

我使用以下代码解决了问题:

    col_names <- names(df)
    mat = matrix(nrow=3, ncol=3)
    for (i in 1:3){
      for (j in 1:3){
        mat[j,i]=mean(df[ get(col_names[j]) == 1, ][[col_names[i]]],na.rm = TRUE)
      }
    }
    colnames(mat) <- col_names
    rownames(mat) <- col_names

mat 看起来应该像这样(这里有分数):

    A1   A2   A3
A1 3/3  1/1  2/2
A2 1/3  1/1  0/2
A3 2/3  0/1  2/2

不幸的是,这段代码非常慢。有没有一种方法(也许用lapply ...)来加快速度?我没有摆脱条件(== 1)...

2 个答案:

答案 0 :(得分:1)

#DATA
df1 = structure(list(A1 = c(1L, 1L, 1L),
                     A2 = c(0L, 1L, 0L),
                     A3 = c(1L, 0L, 1L)),
                class = "data.frame", row.names = c(NA, -3L))

df1
#  A1 A2 A3
#1  1  0  1
#2  1  1  0
#3  1  0  1

sapply(1:NCOL(df1), function(i) sapply(1:NCOL(df1), function(j) {
    sum((df1[,j])[df1[,i] == 1])/sum(df1[,i])
    #paste0(sum((df1[,j])[df1[,i] == 1]),"/",sum(df1[,i]))
}))
#          [,1] [,2] [,3]
#[1,] 1.0000000    1    1
#[2,] 0.3333333    1    0
#[3,] 0.6666667    0    1

答案 1 :(得分:0)

另一个选择:

# convert to matrix
mat <- as.matrix(df)

# calculate numerator
numer <- matrix(NA, 3, 3)
for(j in 1:ncol(df)) { numer[,j] <- colSums(mat*mat[,j]) }

# calculate denominator
denom <- matrix(apply(mat, 2, sum, na.rm=T), 3, 3, byrow=T)

# divide numer/denom and fix NAs and Infs
result <- numer / denom
result[is.na(result)|is.infinite(result)] <- 0

result