我试图将sample
函数用于我的任务,这是为了从一个因子的每个级别对 n 随机行进行采样并创建一个基于的新变量这和另一个变量的值。
简化示例:
Subject = c("100","100","100","100", "100", "200", "200", "200", "200", "200")
Condition = c("Blue","Blue","Blue","Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue")
Response = rnorm(10)
df = data.frame(Subject,Condition, Response)
此处的目标是为Subject
的每个级别抽样3个随机行,创建一个新变量,假设Condition.Rand
其中随机选择的行标记为“红色”,其余行标记为与Condition
中的任何值 - 在这种情况下,“蓝色”。因此,对于每个Subject
,60%的Condition.Rand
标记为“红色”,40%标记为“蓝色”。
要明确的是,我希望完全 3个随机行(或5个观察值的60%)标记为“红色”为主题100,完全 3个随机行标记主题200的“红色”。
谢谢!
答案 0 :(得分:2)
使用split
将df
分为子组和sample
"Red"
和"Blue"
,并为每个子组提供所需的概率。
set.seed(42)
do.call(rbind, lapply(split(df, df$Subject), function(a)
cbind(a,
cond.rand = sample(c("Red","Blue"), size = nrow(a), replace = TRUE, prob = c(0.6,0.4)))))
# Subject Condition Response cond.rand
#100.1 100 Blue -1.7813084 Blue
#100.2 100 Blue -0.1719174 Blue
#100.3 100 Blue 1.2146747 Red
#100.4 100 Blue 1.8951935 Blue
#100.5 100 Blue -0.4304691 Blue
#200.6 200 Blue -0.2572694 Red
#200.7 200 Blue -1.7631631 Blue
#200.8 200 Blue 0.4600974 Red
#200.9 200 Blue -0.6399949 Blue
#200.10 200 Blue 0.4554501 Blue
答案 1 :(得分:2)
我们也可以使用ave
base R
执行此操作
set.seed(42)
df1$cond.rand <- with(df, ave(seq_along(Subject), Subject, FUN = function(x)
sample(c("Red", "Blue"), size = length(x), replace = TRUE, prob = c(0.6, 0.4))))
df1$cond.rand
#[1] "Blue" "Blue" "Red" "Blue" "Blue" "Red" "Blue" "Red" "Blue" "Blue"