Question

我有两个看起来像这样的data.frame：

df1
  Gene name   sample1    sample2    sample3     sample4     sample5  
   A             0          1         0           0           1 
   B             1          0         0           1           0
   C             0          0         1           1           1
   D             1          0         0           1           0



df_final
  Gene name   sample1    sample2    sample3     sample4     sample5  
   A             1          1         1           0           0 
   B             0          1         0           0           0
   C             1          1         0           0           0
   D             1          1         0           0           0

仅存在“0”和“1”的值。我想要一个data.frame，其中 df1或df2中的条目在两个data.frame中都是== 1，它将保持为“1”（与“0”相同）。否则，当在一个data.frame（例如df1）中为== 1而在另一个data.frame（例如df2）中为0时，该条目将变为1。这两个data.frames具有相同的行数和相同的列数。

所需的输出将是：

df1
  Gene name   sample1    sample2    sample3     sample4     sample5  
   A             1          1         1           0           1 
   B             1          1         0           1           0
   C             1          1         1           1           1
   D             1          1         0           1           0

由于我是R的新手，我想在第一个和第二个data.frame上使用for循环学习循环多个data.frames。目前我无法做这样的工作。有人可以帮帮我吗？

最佳，

电子。

Answer 1

＆＃34; R＆＃34;做这种事情的方法是利用矢量化：

df3 <- df1
> df3[,-1] <- ((df1[,-1] + df2[,-1]) > 0) + 0
> df3
  Genename sample1 sample2 sample3 sample4 sample5
1        A       1       1       1       0       1
2        B       1       1       0       1       0
3        C       1       1       1       1       1
4        D       1       1       0       1       0

循环仍然在发生，但在更快的编译代码中。

简要说明：

我们可以以矢量化方式添加两个数据框的数字部分：

(df1[,-1] + df2[,-1])
  sample1 sample2 sample3 sample4 sample5
1       1       2       1       0       1
2       1       1       0       1       0
3       1       1       1       1       1
4       2       1       0       1       0

然后，如果我们询问哪些值大于零，我们得到＆＃34;对＆＃34;回答，但在布尔语而不是0和1：

> (df1[,-1] + df2[,-1]) > 0
     sample1 sample2 sample3 sample4 sample5
[1,]    TRUE    TRUE    TRUE   FALSE    TRUE
[2,]    TRUE    TRUE   FALSE    TRUE   FALSE
[3,]    TRUE    TRUE    TRUE    TRUE    TRUE
[4,]    TRUE    TRUE   FALSE    TRUE   FALSE

幸运的是，如果我们只添加0，R会将布尔值强制转换为整数：

> ((df1[,-1] + df2[,-1]) > 0) + 0
     sample1 sample2 sample3 sample4 sample5
[1,]       1       1       1       0       1
[2,]       1       1       0       1       0
[3,]       1       1       1       1       1
[4,]       1       1       0       1       0

Answer 2

您想要的是一个按位OR运算：https://en.wikipedia.org/wiki/Bitwise_operation#OR

R 3.0中有按位运算的函数：bitwAnd，bitwNot，bitwOr，bitwShiftL，bitwShiftR和bitwXor（bitwOr是你要查找的那个）。

答案joran给出的工作正常，但是如果你运行的是R 3.0，我建议使用按位操作，因为它们的工作速度更快：

 > system.time(for (i in 1:10000) {df3[,-1] <- ((df1[,-1] + df2[,-1]) > 0) + 0})
   user  system elapsed 
  13.58    0.00   13.59

 > system.time(for (i in 1:10000) {df3[,-1] = bitwOr(unlist(df1[,-1]), unlist(df2[,-1]))})
   user  system elapsed 
   5.44    0.00    5.45

Answer 3

简短的方法：#df3 <- as.integer(df1+df2>0)＃这是错误的

编辑简短方法：df3 <- apply(df1+df2>0, c(1,2), as.integer)＃可能更短

使用循环等：

df3 <- as.data.frame(matrix(rep(NA, nrow(df1)*ncol(df1)),ncol=ncol(df1))
names(df3) <- names(df1)

for(i in 1:ncol(df1)){
  for(j in 1:nrow(df1)){
    if(i==1){#edited
       df3[j,i] <- df1[j,i]#edited; note, this is dangerous b/c it is assuming the data frames are organized in the same way
    }else{#edited
       df3[j,i] <- as.integer((df1[j,i] + df2[j,i])>0)
    }#edited
  }
}

那工作？

根据值合并两个二进制data.frames

3 个答案: