循环遍历data.frame中的列,并根据循环计算生成新的data.frame

时间:2015-02-05 14:03:30

标签: r loops dataframe

我有一个大的data.frame,我希望将列中的值连接在一起,然后使用输出创建一个新的data.frame。由于我的data.frame有近1700列,我认为最简单的方法是遍历列。以下是我想要做的一个例子。

起始值:

variable1 = c(var1, var2, var3)
variable2 = c(var4, var5, var6)
variable3 = c(var7, var8, var9)
df = data.frame(variable1, variable2, variable3)

预期产出:

  variable1 variable2 variable3
1 var1_var2  var4_var5  var7_var8
2 var1_var3  var4_var6  var7_var9
3 var2_var3  var5_var6  var8_var9

我现在使用的代码是:

index = 1
column = 1

Complexes <- dim(df)[2]
proteins <- dim(df)[1]


complex <-list()
interactions <- list()
complexcol <- list()

for(i in 1:Complexes){
  complex[[column]]=(for(j in 1:proteins){
    for(k in j+1:proteins){
      interactions[index] = c(paste0(corum[i,j],"_",corum[i,k]))
      index = index +1
    }
  })
  column = column + 1
  print(column)
  index = 1
}

当我执行时,它遍历列,但它不会在新列表或data.frame中生成输出。

谢谢!

2 个答案:

答案 0 :(得分:4)

您可以使用combn函数获取所有组合,使此操作成为1行:

# Build example data
(dat = data.frame(1:3, 4:6, 7:9))
#   X1.3 X4.6 X7.9
# 1    1    4    7
# 2    2    5    8
# 3    3    6    9

# Get all combinations of rows
t(apply(combn(nrow(dat), 2), 2, function(x) paste0(dat[x[1],], "_", dat[x[2],])))
#      [,1]  [,2]  [,3] 
# [1,] "1_2" "4_5" "7_8"
# [2,] "1_3" "4_6" "7_9"
# [3,] "2_3" "5_6" "8_9"

如果您有一个存储因子的数据框,并且想要组合它们的级别,您可以将数据帧转换为实际存储字符串然后使用相同代码的数据框

# Make data frame with factors
(dat = data.frame(X=c("a", "b", "c"), Y=c("d", "e", "f"), Z=c("g", "h", "i")))
#   X Y Z
# 1 a d g
# 2 b e h
# 3 c f i
str(dat)
# 'data.frame': 3 obs. of  3 variables:
#  $ X: Factor w/ 3 levels "a","b","c": 1 2 3
#  $ Y: Factor w/ 3 levels "d","e","f": 1 2 3
#  $ Z: Factor w/ 3 levels "g","h","i": 1 2 3

# Convert to data frame with strings and then use same code
dat2 <- data.frame(lapply(dat, as.character), stringsAsFactors=F)
t(apply(combn(nrow(dat2), 2), 2, function(x) paste0(dat2[x[1],], "_", dat2[x[2],])))
#      [,1]  [,2]  [,3] 
# [1,] "a_b" "d_e" "g_h"
# [2,] "a_c" "d_f" "g_i"
# [3,] "b_c" "e_f" "h_i"

答案 1 :(得分:1)

我想在此处使用dplyrdata.table做出更多贡献。受到@David Arenburg的启发,我得到了以下内容。

df <- data.frame(variable1 = c("var1", "var2", "var3"),
                 variable2 = c("var4", "var5", "var6"),
                 variable3 = c("var7", "var8", "var9"),
                 stringsAsFactors = FALSE) 

library(dplyr)
mutate_each(df, funs(combn(., 2, paste, collapse = "_")))

#  variable1 variable2 variable3
#1 var1_var2 var4_var5 var7_var8
#2 var1_var3 var4_var6 var7_var9
#3 var2_var3 var5_var6 var8_var9

library(data.table)
setDT(df)[, lapply(.SD, function(x) {combn(x, 2, paste, collapse = "_")})]

#   variable1 variable2 variable3
#1: var1_var2 var4_var5 var7_var8
#2: var1_var3 var4_var6 var7_var9
#3: var2_var3 var5_var6 var8_var9
相关问题