如何使用插入符号比较不同的模型,调整不同的参数?

时间:2019-02-11 18:38:42

标签: r machine-learning compare r-caret nnet

我正在尝试实现一些功能,以比较五个不同的机器学习模型,以预测回归问题中的某些值。

我的意图是开发一套可以训练不同代码并将其组织成一套结果的功能。我通过实例选择的模型是:套索,随机森林,支持向量机,线性模型和神经网络。为了调整某些模型,我打算使用Max Kuhn的引用:https://topepo.github.io/caret/available-models.html。 但是,由于每种模型都需要不同的调整参数,所以我不确定如何设置它们:

首先,我将网格设置为“ nnet”模型调整。在这里,我选择了隐藏层中不同数量的节点和衰减系数:

my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))

然后,我构建将以6折配置5次运行五个模型的函数:

 my_list_model <- function(model) {
  set.seed(1)
  train.control <- trainControl(method = "repeatedcv", 
         number = 6,
         repeats =  5,
         returnResamp = "all",
         savePredictions = "all")

# The tunning configurations of machine learning models:
  set.seed(1)
  fit_m <- train(ST1 ~., 
         data = train, # my original dataframe, not showed in this code
         method = model, 
         metric = "RMSE", 
         preProcess = "scale", 
         trControl = train.control
         linout = 1        #  linear activation function output
         trace = FALSE
         maxit = 1000
         tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters

 return(fit_m)
 } 

最后,我执行了五个模型:

lapply(list(
Lass = "lasso", 
RF = "rf", 
SVM = "svmLinear",
OLS = "lm", 
NN = "nnet"), 
my_list_model) -> model_list

但是,当我运行它时,它显示:

  

错误:调整参数网格不应包含列分数

据我了解,我不知道如何很好地指定调音参数。如果我尝试抛弃'nnet'模型并将其更改为倒数第二行,例如,将其更改为XGBoost模型,则看起来效果很好,并且可以计算出结果。也就是说,似乎问题在于“ nnet”调整参数。

然后,我认为我真正的问题是:如何配置这些不同的模型参数,特别是“ nnet”模型。另外,由于我不需要设置套索,随机森林,svmLinear和线性模型的参数,因此如何通过插入符号包对其进行调整?

1 个答案:

答案 0 :(得分:1)

my_list_model <- function(model,grd=NULL){
  train.control <- trainControl(method = "repeatedcv", 
                            number = 6,
                            returnResamp = "all",
                            savePredictions = "all")

 # The tuning configurations of machine learning models:
 set.seed(1)
 fit_m <- train(Y ~., 
             data = df, # my original dataframe, not showed in this code
             method = model, 
             metric = "RMSE", 
             preProcess = "scale", 
             trControl = train.control,
             linout = 1,        #  linear activation function output
             trace = FALSE,
             maxit = 1000,
             tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
 return(fit_m)
 }

首先运行以下代码,然后查看所有相关参数

modelLookup('rf')

现在基于上面的查找代码制作所有模型的网格

svmGrid <-  expand.grid(C=c(3,2,1))
rfGrid <-  expand.grid(mtry=c(5,10,15))

创建所有模型网格的列表,并确保模型名称与列表中的名称相同

grd_all<-list(svmLinear=svmGrid
          ,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
               ,function(x){my_list_model(x,grd_all[[x]])})
model_list

[[1]]
Random Forest 

17 samples
3 predictor

Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:

mtry  RMSE      Rsquared   MAE     
 5    63.54864  0.5247415  55.72074
10    63.70247  0.5255311  55.35263
15    62.13805  0.5765130  54.53411

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.

[[2]]
Support Vector Machines with Linear Kernel 

17 samples
3 predictor

Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:

C  RMSE      Rsquared   MAE     
1  59.83309  0.5879396  52.26890
2  66.45247  0.5621379  58.74603
3  67.28742  0.5576000  59.55334

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.