R中的随机森林模型

时间:2019-10-30 07:23:38

标签: r machine-learning random-forest

是否可以通过微调训练数据上的超参数来创建多个随机森林模型,并对照所有模型检查测试数据性能并将其存储在csv文件中?

例如:-我有一个模型,mtry是6,nodesize是3,另外一个模型,其中mtry是10而nodesize是4我需要要做的是在测试数据上测试这两个模型的性能,并存储诸如混淆矩阵,敏感性和特异性之类的关键模型指标。

我尝试了以下代码

train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'sensitivity'=0,'specificity'=0)
modellist <- list()

for (mtry in c(6,11)){
  for (nodesize in c(2,3)){
    fit_model <- randomForest(dv~., train_final,mtry = mtry, importance=TRUE, nodesize=nodesize,
                                sampsize = ceiling(.8*nrow(train_final)), proximity=TRUE,na.action = na.omit,
                            ntree=500)
      Key_col <- paste0(mtry,"-",nodesize)
      modellist[[Key_col]] <- fit_model

      pred_train <- predict(fit_model, train_final)
      cf <- confusionMatrix(pred_train, train_final$DV, mode = 'everything', positive = '1')
      train_performance$TN <- cf$table[1]
      train_performance$FP <- cf$table[2]
      train_performance$FN <- cf$table[3]
      train_performance$TP <- cf$table[4]
      train_performance$accuracy=cf$overall[1]
      train_performance$kappa=cf$overall[2]
      train_performance$sensitivity=cf$byClass[1]
      train_performance$specificity=cf$byClass[2]
      train_performance$key=Key_col
    }
  }

1 个答案:

答案 0 :(得分:1)

下面是使用caret软件包的示例方法,介绍如何调整和训练随机森林模型,该模型可输出所有模型的精度参数:

library(randomForest)
library(mlbench)
library(caret)

# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]
# Create model with default paramters
control <- trainControl(method="repeatedcv", number=10, repeats=3)
seed <- 7
metric <- "Accuracy"
set.seed(seed)
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_default)

输出:

Resampling results

  Accuracy   Kappa      Accuracy SD  Kappa SD 
  0.8138384  0.6209924  0.0747572    0.1569159

使用Caret进行调音:

随机搜索: 我们可以使用的一种搜索策略是尝试某个范围内的随机值。

# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)

输出:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
  11    0.8218470  0.6365181  0.09124610   0.1906693
  14    0.8140620  0.6215867  0.08475785   0.1750848
  17    0.8030231  0.5990734  0.09595988   0.1986971
  24    0.8042929  0.6002362  0.09847815   0.2053314
  30    0.7933333  0.5798250  0.09110171   0.1879681
  34    0.8015873  0.5970248  0.07931664   0.1621170
  45    0.7932612  0.5796828  0.09195386   0.1887363
  47    0.7903896  0.5738230  0.10325010   0.2123314
  49    0.7867532  0.5673879  0.09256912   0.1899197
  50    0.7775397  0.5483207  0.10118502   0.2063198
  60    0.7790476  0.5513705  0.09810647   0.2005012

enter image description here

网格搜索: 另一个搜索是定义要尝试的算法参数的网格。

control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)

输出:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
   1    0.8377273  0.6688712  0.07154794   0.1507990
   2    0.8378932  0.6693593  0.07185686   0.1513988
   3    0.8314502  0.6564856  0.08191277   0.1700197
   4    0.8249567  0.6435956  0.07653933   0.1590840
   5    0.8268470  0.6472114  0.06787878   0.1418983
   6    0.8298701  0.6537667  0.07968069   0.1654484
   7    0.8282035  0.6493708  0.07492042   0.1584772
   8    0.8232828  0.6396484  0.07468091   0.1571185
   9    0.8268398  0.6476575  0.07355522   0.1529670
  10    0.8204906  0.6346991  0.08499469   0.1756645
  11    0.8073304  0.6071477  0.09882638   0.2055589
  12    0.8184488  0.6299098  0.09038264   0.1884499
  13    0.8093795  0.6119327  0.08788302   0.1821910
  14    0.8186797  0.6304113  0.08178957   0.1715189
  15    0.8168615  0.6265481  0.10074984   0.2091663

enter image description here

还有许多其他方法可以调整随机森林模型并存储这些模型的结果,其中两种是使用最广泛的方法。

此外,您还可以手动设置这些参数并训练和调整模型。