在用户定义的函数中使用step()时缺少对象错误

时间:2013-05-17 03:51:38

标签: r regression glm

5天仍未回答

  • 从Simon的评论中可以看出,这是一个可重复且非常奇怪的问题。似乎只有在具有很高预测能力的逐步回归包含在函数中时才会出现这个问题。

我一直在努力解决这个问题,任何帮助都会非常感激。我正在尝试编写一个运行几个逐步回归的函数,并将所有这些函数输出到列表中。但是,R在读取我在函数参数中指定的数据集时遇到问题。我在各个主板(hereherehere)上发现了几个类似的错误,但是它们似乎都没有得到解决。这一切都归结为在用户定义的函数中调用step()的一些奇怪问题。我使用以下脚本来测试我的代码。多次运行整个过程,直到出现错误(相信我,它会):

test.df <- data.frame(a = sample(0:1, 100, rep = T),
                      b = as.factor(sample(0:5, 100, rep = T)),
                      c = runif(100, 0, 100),
                      d = rnorm(100, 50, 50))
test.df$b[10:100] <- test.df$a[10:100] #making sure that at least one of the variables has some predictive power

stepModel <- function(modeling.formula, dataset, outfile = NULL) {
  if (is.null(outfile) == FALSE){
    sink(file = outfile,
         append = TRUE, type = "output")
    print("")
    print("Models run at:")
    print(Sys.time())
  }
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
  sink()
  return(output)
}

blah <- stepModel(a~., dataset = test.df)

这会返回以下错误消息(如果错误没有立即显示,请继续重新运行test.df脚本以及调用stepModel(),它最终会显示):

Error in is.data.frame(data) : object 'dataset' not found

我已经确定一切都运行正常,直到model.stepwise2开始构建。不知何故,临时对象'数据集'在第一步逐步回归中工作正常,但第二步无法识别。我通过评论部分功能找到了这一点,如下所示。此代码运行正常,证明对象'dataset'最初被识别:

stepModel1 <- function(modeling.formula, dataset, outfile = NULL) {
  if (is.null(outfile) == FALSE){
    sink(file = outfile,
         append = TRUE, type = "output")
    print("")
    print("Models run at:")
    print(Sys.time())
  }
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
#   model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
#   sink()
#   output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
  return(model.stepwise1)
}

blah1 <- stepModel1(a~., dataset = test.df) 

编辑 - 在有人询问之前,所有的summary()函数都在那里,因为完整的函数(我编辑它以便你可以专注于错误)有另一个片段定义一个文件到你可以输出逐步跟踪。我摆脱了他们

编辑2 - 会话信息

  

sessionInfo()       R版本2.15.1(2012-06-22)       平台:x86_64-pc-mingw32 / x64(64位)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sqldf_0.4-6.4         RSQLite.extfuns_0.0.1 RSQLite_0.11.3        chron_2.3-43         
 [5] gsubfn_0.6-5          proto_0.3-10          DBI_0.2-6             ggplot2_0.9.3.1      
 [9] caret_5.15-61         reshape2_1.2.2        lattice_0.20-6        foreach_1.4.0        
[13] cluster_1.14.2        plyr_1.8             

loaded via a namespace (and not attached):
 [1] codetools_0.2-8    colorspace_1.2-1   dichromat_2.0-0    digest_0.6.2       grid_2.15.1       
 [6] gtable_0.1.2       iterators_1.0.6    labeling_0.1       MASS_7.3-18        munsell_0.4       
[11] RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2      tools_2.15

编辑3 - 执行与该功能相同的操作,只是不使用函数。即使算法没有收敛,每次运行也都会正常运行:

modeling.formula <- a~.
dataset <- test.df
outfile <- NULL
if (is.null(outfile) == FALSE){
  sink(file = outfile,
       append = TRUE, type = "output")
  print("")
  print("Models run at:")
  print(Sys.time())
}
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)

1 个答案:

答案 0 :(得分:5)

使用do.call来引用调用环境中的数据集对我有用。有关原始建议,请参阅https://stackoverflow.com/a/7668846/210673。这是一个有效的版本(删除了sink代码)。

stepModel2 <- function(modeling.formula, dataset) {
  model.initial <- do.call("glm", list(modeling.formula,
                       family = "binomial",
                       data = as.name(dataset)))
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
}

blah <- stepModel2(a~., dataset = "test.df")

对于set.seed(6)与原始代码一致,它失败了。它失败的原因是dataset函数中不存在step变量,虽然在model.stepwise1中不需要它,但model.stepwise2 model.stepwise1时需要它{{1}} 1}}保持一个线性项。因此,当您的版本失败时就是这种情况。像我一样在全球环境中调用数据集解决了这个问题。