Question

我正在编写一个函数，必须随机将整个集合分成两个较小的函数。集合的大小由用户确定。我会尝试这样做

number <- function(z,y,p){
indeks <-split(z$y,sample(rep(1:2), c(p, z$y-p)))
train <- z[indeks,]
test <- z [-indeks, ]
result <- list(test, train)
list(result)
}
number(z=lipiec , y=VII,  p=200)

但是，弹出以下错误

 Error in sample.int(length(x), size, replace, prob) : 
 cannot take a sample larger than the population when 'replace = FALSE'

我试图划分的文件结构是int。并且有574行。因此值200不大于整个样本。我想得到两个随机分组，其中一个（测试）将有200个元素，另一个（训练）将是基本集的其余部分。有谁知道我做错了什么？

***** **** EDIT 修改后我做了如下：

number <- function(z,y,p){
df <- as.data.frame(z$y)
indeks <-split(df, sample(nrow(df))<=p)
train <- indeks$
test <- indeks$
str(test)}
number(z=lipiec , y=VII,  p=200)

现在我不知道应该为测试分配什么，并培训为每个测试分配一个部分。有人有想法吗？

Answer 1

您可以尝试：

split(df,sample( c(rep(1,200),rep(2,574-200))))

Answer 2

myfun <- function(df, N) {
    split(df, sample(nrow(df))<=N)
}

set.seed(1)
myfun(mtcars,10)

划分为两个不相等的随机部分

2 个答案: