Subset data frame in batches of 100 rows

时间:2016-03-02 20:01:09

标签: r recursion subset lapply

I want to subset a large data frame by groups of 100 rows, to feed into a function.

A simplified example: Here's my "large" data frame of 1000 rows.

df<-data.frame(c(sample(2:100,1000,replace=TRUE)),c(sample(2:100,1000,replace=TRUE)))

I need to feed each group of 100 rows from df[,1] into this dummy function:

dummy<-function(x){
return(c("There are ",x," dummies in this room"))
}

I need to do this in sets of 100 because the dummy function can only handle 100 values at once.

This will feed the entirety of df[,1] into the function:

lapply(df[,1],dummy)

But instead, I need something like this:

lapply(df[1:100,1],dummy)
lapply(df[101:200,1]dummy)
. . . etc

How do I do this in a succinct way, preferably with base r?

2 个答案:

答案 0 :(得分:3)

如果您的数据集中没有因子变量,请使用split,或者您不想使用cut的向量路径,这样的短程序可能就足够了:

df<-data.frame(c(sample(2:100,1000,replace=TRUE)),c(sample(2:100,1000,replace=TRUE)))
sample<-list()
div<-seq(100,nrow(df),100)
for(i in 1:length(div))
{
    sample[[i]]<-df[(100*(i-1)):div[i],]
}

答案 1 :(得分:0)

正如@A Webb所建议的那样,使用split会有所帮助。

df<-data.frame(c(sample(2:100,1000,replace=TRUE)),
               c(sample(2:100,1000,replace=TRUE)))

# For sequential grouping
groups<-10 
split(df, factor(sort(rank(row.names(df))%%groups)))

# For Random sampling of 100
split(df, sample(1:groups, nrow(df), replace=T))

sapply(groups_split, yourfunc)

可能存在更有效的方式,希望看到新的答案。