我有多个名为Dataset.1
,Dataset.2
,...的数据集,我是通过循环创建的。
现在我想在每个数据集中创建一个新列,如:
require(plyr)
Dataset.1 <- ddply(Dataset.1, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
Dataset.2 <- ddply(Dataset.2, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
Dataset.3 <- ddply(Dataset.3, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
Dataset.4 <- ddply(Dataset.4, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
.....
由于我的数据集数量并不总是相同,所以我认为循环函数是正确的方法。我只是不知道如何。
在我看来,循环的开始应该是:
dataset_names <- ls(pattern = "Dataset.")
for(i in 1:length(dataset_names)) {
.....
}
非常感谢你的帮助!
燕姿
答案 0 :(得分:0)
使用assign
和get
函数通过长度为1的字符向量调用对象。
# Simulate some data
set.seed(2014)
Dataset.1 <- data.frame(Col_x=rbinom(n=100,size=3,prob=0.6), Col_y=rnorm(100))
Dataset.2 <- data.frame(Col_x=rbinom(n=100,size=3,prob=0.6), Col_y=rnorm(100))
# Before transformation
head(Dataset.1)
# Col_x Col_y
# 1 2 0.4614496
# 2 3 0.3350788
# 3 2 -0.8645477
# 4 2 1.1806771
# 5 2 -0.1938235
# 6 3 0.8250026
head(Dataset.2)
# Col_x Col_y
# 1 2 0.2342058
# 2 2 -2.3599130
# 3 1 -0.7225682
# 4 1 0.2513051
# 5 1 1.0576962
# 6 0 0.3083427
require(plyr)
# Loop over the datasets
for(a_dataset in ls(pattern = "Dataset.")){
original_df <- get(a_dataset)
transformed_df <- ddply(.data=original_df, .variables="Col_x", .fun=summarize, mean = mean(Col_y, na.rm=TRUE))
assign(x=paste0("summarised.",a_dataset), value=transformed_df)
}
summarised.Dataset.1
# Col_x mean
# 1 0 0.235238326
# 2 1 -0.171010231
# 3 2 0.261661820
# 4 3 0.009241608
summarised.Dataset.1
# Col_x mean
# 1 0 0.235238326
# 2 1 -0.171010231
# 3 2 0.261661820
# 4 3 0.009241608
答案 1 :(得分:-2)
如果您的数据集名称模式与您提到的相同,请尝试使用
NumberOfDataset #Assume this variable have number of dataset you have
for(i in 1:NumberOfDataset){
datasetName = paste0('Dataset.',i)
assign(datasetName, ddply(get(datasetName), "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
}