当R中存在关系时,因子变量中的所有可能的排列

时间:2017-11-30 02:44:21

标签: r permutation

我有一个数据框,其行按变量x的值排序。如果x的值存在关系(如下例中的值50和60),我需要变量group中所有可能的值排列。我怎样才能在R中实现这一目标?有专门的功能吗?

初始数据:

x  group
45     A
50     A
50     A
50     B
52     A
60     A
60     B
70     B
88     B

期望的结果:

x  group group2 group3 group4 group5 group6
45     A      A      A      A      A      A
50     A      A      B      A      A      B
50     A      B      A      A      B      A
50     B      A      A      B      A      A
52     A      A      A      A      A      A
60     A      A      A      B      B      B
60     B      B      B      A      A      A
70     B      B      B      B      B      B
88     B      B      B      B      B      B

3 个答案:

答案 0 :(得分:2)

可能是一个复杂的答案。试试这段代码

 df <- read.table(text = 'x  group
             45     A
             50     A
             50     A
             50     B
             52     A
             60     A
             60     B
             70     B
             88     B', header = TRUE)

library(data.table)
library(gtools)
ss <- list()
setDT(df)[, {n = .N; ss <<- append(ss, list(data.frame(apply(gtools::permutations(n = n, r = n), 1, function(x) group[x])))); NULL}, by = 'x']
max_col <- max(sapply(ss, length))
ss[] <- lapply(ss, function(x) {
   y <- x 
   while(length(y) < max_col)
     y <- data.frame(y, x[, 1:min(length(x), max_col - length(y))]) 
   names(y) <- paste0('group', 1:max_col)
   y
})
tt <- do.call('rbind', ss)
tt$x <- df$x
tt

最终输出

  group1 group2 group3 group4 group5 group6  x
1      A      A      A      A      A      A 45
2      A      A      A      A      B      B 50
3      A      B      A      B      A      A 50
4      B      A      B      A      A      A 50
5      A      A      A      A      A      A 52
6      A      B      A      B      A      B 60
7      B      A      B      A      B      A 60
8      B      B      B      B      B      B 70
9      B      B      B      B      B      B 88

答案 1 :(得分:2)

只是另一个(更清洁)的解决方案。我们的想法是计算每个平局的所有排列,并计算组合所需的副本数量。

df <- structure(list(x = c(45L, 50L, 50L, 50L, 52L, 60L, 60L, 70L,
88L), group = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor")), .Names = c("x", "group"), class = "data.frame", row.names = c(NA,
-9L))

library(tidyverse)
library(iterpc)

ux <- unique(df$x)
m <- length(ux)
members <- ux %>% map(~ filter(df, x == .)) %>% 
    map(~ getall(iterpc(table(as.character(.$group)), ordered = TRUE)))
nrs <- members %>% map_int(nrow)
members <- members %>% 
    imap(~.x[rep(seq_len(nrow(.x)), prod(tail(nrs, m-.y)) , each = prod(head(nrs, .y-1))), , drop=FALSE])
data.frame(x = df$x, t(do.call(cbind, members)))
#>    x X1 X2 X3 X4 X5 X6
#> 1 45  A  A  A  A  A  A
#> 2 50  A  A  B  A  A  B
#> 3 50  A  B  A  A  B  A
#> 4 50  B  A  A  B  A  A
#> 5 52  A  A  A  A  A  A
#> 6 60  A  A  A  B  B  B
#> 7 60  B  B  B  A  A  A
#> 8 70  B  B  B  B  B  B
#> 9 88  B  B  B  B  B  B

答案 2 :(得分:1)

一个非常棘手的问题!它的核心是你需要某个版本的Heap's algorithm。有了这个,就可以使用基数R来查找具有多个x值的group的所有级别,对这些值进行置换,然后组合这些排列。实际上,我为不同的项目编写了这个算法的一个版本,因此将它应用于您的数据相对容易。

首先,算法:

permute.items <- function(x) {
  l <- length(x);
  if (l == 1) return(matrix(x, 1, 1));

  sub.permute <- permute.items(x[-length(x)]);
  arrangements <- rep(sub.permute, each=l);
  arrangements <- matrix(arrangements, nrow(sub.permute) * l, ncol(sub.permute) + 1);
  i <- rep(1:nrow(sub.permute), each=l);
  j <- rep(1:l, l);
  insert <- ifelse(i %% 2 == 1, l - j + 1, j);

  for (xx in 1:nrow(arrangements)) {
    arrangements[xx, insert[xx]] <- x[l];
    counter <- 1;
    for (yy in 1:l) {
      if (yy != insert[xx]) {
        arrangements[xx, yy] <- sub.permute[i[xx], counter];
        counter <- counter + 1;
      }
    }
  }  
  return(arrangements);
}

此函数接受诸如c(1, 2, 3)c('a', 'b', 'c')之类的向量,并返回一个矩阵,其中每一行都是原始值的可能排列。 请注意,算法在超过10-11个元素时变得非常慢。它最初也是为输入向量永远不会有重复元素的项目设计的,所以我们必须快速删除它们。 / p>

# read in example data
df <- read.table(text = 'x  group
45     A
                 50     A
                 50     A
                 50     B
                 52     A
                 60     A
                 60     B
                 70     B
                 88     B', header = T, stringsAsFactors = F)

# split the data into a list.
# each element in the list corresponds to one value of 'x', and contains its values of 'group'
x.split <- split(df$group, df$x)

# for each value of 'x', compute unique permutations and store as a matrix
x.split <- lapply(x.split, function(x) {
  y <- permute.items(x)
  y <- y[!duplicated(y), ]
  y <- as.matrix(y)
})

# compute total number of groups we'll need
groups <- prod(unlist(sapply(x.split, function(x) dim(x)[1])))

# pre-allocate final storage
final <- matrix(NA, nrow = nrow(df), ncol = groups)

# loop through the lists' contents and glue together group permutations
for (g in 1:groups) {
  final[, g] <- unlist(lapply(x.split, function(x) x[, (g %% ncol(x)) + 1]))
}

# final formatting
final <- as.data.frame(final)
final$x <- df$x

最终输出:

  V1 V2 V3 V4 V5 V6  x
1  A  A  A  A  A  A 45
2  A  B  A  A  B  A 50
3  B  A  A  B  A  A 50
4  A  A  B  A  A  B 50
5  A  A  A  A  A  A 52
6  B  A  B  A  B  A 60
7  A  B  A  B  A  B 60
8  B  B  B  B  B  B 70
9  B  B  B  B  B  B 88
相关问题