Question

我有一个如下所示的数据集：

d=data.frame(ID = rep(1:7,1), 
             Group1=c('A','C','B','C','C','A','B'),
             Group2=c('B','A','C','B','B','B','D'))

 ID Group1 Group2
  1      A      B
  2      C      A
  3      B      C
  4      C      B
  5      C      B
  6      A      B
  7      B      D

我需要根据Group1随机抽样1个案例。 Group1有三种类型：A，B，C。我需要从每种类型中取样1。

同时，样本的Group2类型不会在样本的Group2中重复。

例如，如果我只根据Group1进行采样：

dsample=d %>% group_by(Group1) %>%sample_n(size=1)

然后样本如下：

ID Group1 Group2

 1      A      B
 7      B      D
 4      C      B

在样品的Group2中，样品中重复了B.为避免重复Group2类型，当按照Group1类型进行采样时，采样应选择ID = 2，以便样本看起来像这样：

ID Group1 Group2

 1      A      B
 7      B      D
 2      C      A

Answer 1

一种可能的方法：继续重新取样，直到获得理想的结果（或者直到你失败了足够多次才能达到预期的结果）：

# data
d=data.frame(ID = rep(1:7,1), 
             Group1=c('A','C','B','C','C','A','B'),
             Group2=c('B','A','C','B','B','B','D'))

# first attempt
dsample = d %>% group_by(Group1) %>% sample_n(size=1)

# if first attempt doesn't work, try again & again (I put an upper limit at 100 runs)
i = 1
while(length(unique(dsample$Group2)) < nrow(dsample) & i < 100){
  dsample = d %>% group_by(Group1) %>% sample_n(size=1)
  i = i + 1
}

> dsample
# A tibble: 3 x 3
# Groups:   Group1 [3]
     ID Group1 Group2
  <int> <fctr> <fctr>
1     1      A      B
2     3      B      C
3     2      C      A

如果无法获得所需的独特组合：

# example where "A" & "B" in Group 1 both have only "A" as Group2 values
d2=data.frame(ID = rep(1:7,1), 
             Group1=c('A','C','B','C','C','A','B'),
             Group2=c('A','A','A','C','B','A','A'))

# same code as before
d2sample = d2 %>% group_by(Group1) %>% sample_n(size=1)

i = 1
while(length(unique(d2sample$Group2)) < nrow(d2sample) & i < 100){
  d2sample = d2 %>% group_by(Group1) %>% sample_n(size=1)
  i = i + 1
}

# fail after 100 rounds of resampling
> d2sample
# A tibble: 3 x 3
# Groups:   Group1 [3]
     ID Group1 Group2
  <int> <fctr> <fctr>
1     6      A      A
2     7      B      A
3     5      C      B
> i
[1] 100

Answer 2

我的第一个想法是一个循环，然后我意识到我们可以看看我们如何从不同的角度采样。更好的解决方案是一次只采样一行，然后从仅包含！= Group1和！= Group2之前采样的池中采样下一行。这应该快得多：

f <- function(){
  x <- sample_n(d,1)
  x <- rbind(x,sample_n(d[which(!d$Group1 %in% x$Group1 & !d$Group2 %in% x$Group2),],1))
  x <- rbind(x,sample_n(d[which(!d$Group1 %in% x$Group1 & !d$Group2 %in% x$Group2),],1))
  print(x)
}

f()

  ID Group1 Group2
6  6      A      B
2  2      C      A
3  3      B      C

如果您知道至少有2个唯一可能的样本，则每次都是随机的，非重复的输出。

如果有人建议如何以这种方式更简洁地重复功能，请随时告诉我。但总的来说，似乎这种方式可能是最有效的。

Answer 3

试试这个递归函数

您的数据

d=data.frame(ID = rep(1:7,1), 
             Group1=c('A','C','B','C','C','A','B'),
             Group2=c('B','A','C','B','B','B','D'))

dsample=d %>% group_by(Group1) %>%sample_n(size=1)

功能

myfun <- function(ans, allowed, restricted, counter, end) {
             allowed <- setdiff(allowed, ans)
             allowed1 <- setdiff(allowed, restricted[counter])
             if (length(allowed) == 0 | counter > end) {
                  if (length(ans) < end) {
                      ans <- c(ans, rep(NA, end-length(ans)))
                  }
                  return(ans)
             } else {
                  counter <- counter + 1
                  ans <- c(ans, sample(allowed1, 1))
                  myfun(ans, allowed, restricted, counter, end)
             }
         }

10次试验的输出

replicate(10,myfun(ans=NULL, unique(d$Group2), dsample$Group1, 1, nrow(dsample)))

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "D"  "C"  "B"  "C"  "B"  "B"  "B"  "B"  "D"  "D"  
[2,] "C"  "D"  "D"  "A"  "D"  "C"  "C"  "A"  "A"  "C"  
[3,] "A"  "B"  "A"  "D"  "A"  "A"  "D"  "D"  "B"  "B"

注意每个复制的输出按列组织

两组样本n

3 个答案:

您的数据

功能

10次试验的输出