根据因子水平和随机选择创建新变量

时间:2017-03-06 22:14:16

标签: r variables random

我试图将sample函数用于我的任务,这是为了从一个因子的每个级别对 n 随机行进行采样并创建一个基于的新变量这和另一个变量的值。

简化示例:

Subject = c("100","100","100","100", "100", "200", "200", "200", "200", "200")
Condition = c("Blue","Blue","Blue","Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue")
Response = rnorm(10)
df = data.frame(Subject,Condition, Response) 

此处的目标是为Subject的每个级别抽样3个随机行,创建一个新变量,假设Condition.Rand其中随机选择的行标记为“红色”,其余行标记为与Condition中的任何值 - 在这种情况下,“蓝色”。因此,对于每个Subject,60%的Condition.Rand标记为“红色”,40%标记为“蓝色”。

要明确的是,我希望完全 3个随机行(或5个观察值的60%)标记为“红色”为主题100,完全 3个随机行标记主题200的“红色”。

谢谢!

2 个答案:

答案 0 :(得分:2)

使用splitdf分为子组和sample "Red""Blue",并为每个子组提供所需的概率。

set.seed(42)
do.call(rbind, lapply(split(df, df$Subject), function(a)
 cbind(a,
  cond.rand = sample(c("Red","Blue"), size = nrow(a), replace = TRUE, prob = c(0.6,0.4)))))
#       Subject Condition   Response cond.rand
#100.1      100      Blue -1.7813084      Blue
#100.2      100      Blue -0.1719174      Blue
#100.3      100      Blue  1.2146747       Red
#100.4      100      Blue  1.8951935      Blue
#100.5      100      Blue -0.4304691      Blue
#200.6      200      Blue -0.2572694       Red
#200.7      200      Blue -1.7631631      Blue
#200.8      200      Blue  0.4600974       Red
#200.9      200      Blue -0.6399949      Blue
#200.10     200      Blue  0.4554501      Blue

答案 1 :(得分:2)

我们也可以使用ave

中的base R执行此操作
set.seed(42)
df1$cond.rand <-  with(df, ave(seq_along(Subject), Subject, FUN = function(x)
    sample(c("Red", "Blue"), size = length(x), replace = TRUE, prob = c(0.6, 0.4))))
df1$cond.rand
#[1] "Blue" "Blue" "Red"  "Blue" "Blue" "Red"  "Blue" "Red"  "Blue" "Blue"
相关问题