使用循环函数在多个数据集中创建新列

时间:2014-06-04 08:55:11

标签: r loops

我有多个名为Dataset.1Dataset.2,...的数据集,我是通过循环创建的。

现在我想在每个数据集中创建一个新列,如:

require(plyr)
Dataset.1 <- ddply(Dataset.1, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
Dataset.2 <- ddply(Dataset.2, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
Dataset.3 <- ddply(Dataset.3, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
Dataset.4 <- ddply(Dataset.4, "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
.....

由于我的数据集数量并不总是相同,所以我认为循环函数是正确的方法。我只是不知道如何。

在我看来,循环的开始应该是:

dataset_names <- ls(pattern = "Dataset.")
for(i in 1:length(dataset_names)) {
.....
}

非常感谢你的帮助!
燕姿

2 个答案:

答案 0 :(得分:0)

使用assignget函数通过长度为1的字符向量调用对象。

# Simulate some data
set.seed(2014)
Dataset.1 <- data.frame(Col_x=rbinom(n=100,size=3,prob=0.6), Col_y=rnorm(100))
Dataset.2 <- data.frame(Col_x=rbinom(n=100,size=3,prob=0.6), Col_y=rnorm(100))

# Before transformation
head(Dataset.1)
#   Col_x      Col_y
# 1     2  0.4614496
# 2     3  0.3350788
# 3     2 -0.8645477
# 4     2  1.1806771
# 5     2 -0.1938235
# 6     3  0.8250026
head(Dataset.2)
#   Col_x      Col_y
# 1     2  0.2342058
# 2     2 -2.3599130
# 3     1 -0.7225682
# 4     1  0.2513051
# 5     1  1.0576962
# 6     0  0.3083427


require(plyr)

# Loop over the datasets
for(a_dataset in ls(pattern = "Dataset.")){

  original_df <- get(a_dataset)

  transformed_df <- ddply(.data=original_df, .variables="Col_x", .fun=summarize, mean = mean(Col_y, na.rm=TRUE))

  assign(x=paste0("summarised.",a_dataset), value=transformed_df)
}

summarised.Dataset.1
#   Col_x         mean
# 1     0  0.235238326
# 2     1 -0.171010231
# 3     2  0.261661820
# 4     3  0.009241608
summarised.Dataset.1
#   Col_x         mean
# 1     0  0.235238326
# 2     1 -0.171010231
# 3     2  0.261661820
# 4     3  0.009241608

答案 1 :(得分:-2)

如果您的数据集名称模式与您提到的相同,请尝试使用

NumberOfDataset #Assume this variable have number of dataset you have
for(i in 1:NumberOfDataset){
  datasetName = paste0('Dataset.',i)
  assign(datasetName, ddply(get(datasetName), "Col_x", transform, Col_y = mean(Col_y, na.rm=TRUE))
}