Question

我正在尝试计算矩阵元素的组合，但是每个元素只能出现一次。

（真实）矩阵是对称的，并且可以包含5个以上的元素（最多约2000个）：

o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])

#           A         B            C         D         E
# A 0.4400317 0.1715681 0.7319108946 0.3994685 0.4466997
# B 0.5190471 0.1666164 0.3430245044 0.3837903 0.9322599
# C 0.3249180 0.6122229 0.6312876740 0.8017402 0.0141673
# D 0.1641411 0.1581701 0.0001703419 0.7379847 0.8347536
# E 0.4853255 0.5865909 0.6096330935 0.8749807 0.7230507

我希望计算所有对组合的乘积（如果可能的话，它应该出现所有元素：AB, CD, EF，如果矩阵包含6个元素），其中每个对一个字母是一列，另一个字母是列一是排。以下是一些组合：

AB, CD, E  
AC, BD, E  
AD, BC, E  
AE, BC, D  
AE, BD, C

单个元素的值只有1。

不需要组合：

AB, BC: Element B appears twice  
AB, AC: Element A appears twice

我尝试过的事情：

我考虑过删除矩阵中不需要的部分：

out <- which(upper.tri(o), arr.ind = TRUE)
out <- cbind.data.frame(out, value = o[upper.tri(o)])

out[, 1] <- colnames(o)[out[, 1]]
out[, 2] <- colnames(o)[out[, 2]]
#    row col     value
# 1    A   B 0.1715681
# 2    A   C 0.7319109
# 3    B   C 0.3430245
# 4    A   D 0.3994685
# 5    B   D 0.3837903
# 6    C   D 0.8017402
# 7    A   E 0.4466997
# 8    B   E 0.9322599
# 9    C   E 0.0141673
# 10   D   E 0.8347536

我的尝试涉及以下过程：

复制矩阵（出）
存储第一行的第一个值。
删除涉及任何一对的所有对。
选择下一对结果矩阵
重复进行，直到矩阵中的所有行都被删除
从另一行开始重复2：5

但是，此方法有一个大问题，它不能保证存储所有组合，并且可以将同一组合存储多次。

我的预期输出是一个向量，其中每个元素都是组合所选择的单元格中值的乘积：

AB, CD: 0.137553

如何有效地提取所有这些组合？

Answer 1

这可能有效。我在N个元素= 5和6上对此进行了测试。

请注意，这不是最佳的，希望可以为您提供一个可以使用的框架。通过更大的数组，我可以看到涉及apply和combn的步骤是一个瓶颈。

这里的想法是先计算唯一集合的集合，然后再从另一个存储集合值的data.frame计算集合的乘积。

通过计算所有组合对中唯一元素的数量来标识唯一集。例如，如果N个元素= 6，我们期望length(unlist(combination)) == 6。如果N个元素= 7（只有3对加上一个剩余元素），则情况也是如此。在N个元素为奇数的情况下，我们可以忽略剩余的未配对元素，因为它受到其他元素的约束。

library(dplyr)
library(reshape2)

## some functions

unique_by_n <- function(inlist, N){
  ## select unique combinations by count 
  ## if unique, expect n = 6 if n elements = 6)
  if(N %% 2) N <- N - 1 ## for odd numbers
  return(length(unique(unlist(inlist))) == N)
}

get_combs <- function(x,xall){
  ## format and catches remainder if matrix of odd elements
  xu <- unlist(x)
  remainder <- setdiff(xall,xu) ## catch remainder if any
  xset <- unlist(lapply(x, paste0, collapse=''))
  finalset <- c(xset, remainder)
  return(finalset)
}

## make dataset
set.seed(0) ## set reproducible example
#o <- matrix(runif(25), ncol = 5, nrow = 5) ## uncomment to test 5
#dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
o <- matrix(runif(36), ncol = 6, nrow = 6)
dimnames(o) <- list(LETTERS[1:6], LETTERS[1:6])
o[lower.tri(o)] <- t(o)[lower.tri(o)] ## make matrix symmetric
n_elements = nrow(o)

#### get matrix
dat <- melt(o, varnames = c('Rw', 'Cl'), as.is = TRUE)
dat$Set <- apply(dat, 1, function(x) paste0(sort(unique(x[1:2])), collapse = ''))
## get unique sets (since your matrix is symmetric)
dat <- subset(dat, !duplicated(Set))

#### get sets
elements <- rownames(o)
allpairs <- expand.grid(Rw = elements, Cl = elements) %>% 
  filter(Rw != Cl) ## get all pairs
uniqpairsgrid <- unique(t(apply(allpairs,1,sort)))
uniqpairs <- split(uniqpairsgrid, seq(nrow(uniqpairsgrid))) ## get unique pairs
allpaircombs <- combn(uniqpairs,floor(n_elements/2)) ## get combinations of pairs
uniqcombs <- allpaircombs[,apply(allpaircombs, 2, unique_by_n, N = n_elements)] ## remove pairs with repeats
finalcombs <- apply(uniqcombs, 2, get_combs, xall=elements)

#### calculate results
res <- apply(finalcombs, 2, function(x) prod(subset(dat, Set %in% x)$value)) ## calculate product
names(res) <- apply(finalcombs, 2, paste0, collapse=',') ## add names
resdf <- data.frame(Sets = names(res), Products = res, stringsAsFactors = FALSE, row.names = NULL)
print(resdf)
#>        Sets    Products
#> 1  AB,CD,EF 0.130063454
#> 2  AB,CE,DF 0.171200062
#> 3  AB,CF,DE 0.007212619
#> 4  AC,BD,EF 0.012494787
#> 5  AC,BE,DF 0.023285088
#> 6  AC,BF,DE 0.001139712
#> 7  AD,BC,EF 0.126900247
#> 8  AD,BE,CF 0.158919605
#> 9  AD,BF,CE 0.184631344
#> 10 AE,BC,DF 0.042572488
#> 11 AE,BD,CF 0.028608495
#> 12 AE,BF,CD 0.047056905
#> 13 AF,BC,DE 0.003131029
#> 14 AF,BD,CE 0.049941770
#> 15 AF,BE,CD 0.070707311

Created on 2018-07-23 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0.9000).

Answer 2

也许以下是您想要的。
请注意，我对正确而不是对表现更感兴趣。

此外，我设置了RNG种子，以产生可重复的结果。

set.seed(9840)    # Make reproducible results

o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])

cmb <- combn(LETTERS[1:5], 2)
n <- ncol(cmb)
res <- NULL
nms <- NULL
for(i in seq_len(n)){
  for(j in seq_len(n)[-seq_len(i)]){
    x <- unique(c(cmb[, i], cmb[, j]))
    if(length(x) == 4){
      res <- c(res, o[cmb[1, i], cmb[2, i]] * o[cmb[1, j], cmb[2, j]])
      nms <- c(nms, paste0(cmb[1, i], cmb[2, i], '*', cmb[1, j], cmb[2, j]))
    }
  }
}

names(res) <- nms

res

提取单元格的组合而不重复索引

2 个答案: