R插入包,使用数据集调整参数

时间:2016-02-13 16:47:18

标签: r r-caret

我改写了我的问题:

我正在使用R与插入包。

我将数据集分为3个部分:培训,验证和测试集。 验证集将用于调整训练参数。

插入符号的列车功能,通过重新采样(默认情况下自举)调整训练参数。 有没有办法告诉列车功能使用我的验证数据集参数而不是重新采样?

现在我必须使用一个循环,你可以在下面的例子中看到。

EG:     代码:

    library(caret)
    set.seed(3)
    data("segmentationData")
    #
    # Partition data set in training, validation, testing.
    #
    inTraining <- createDataPartition(segmentationData$Class, p=.60, list=FALSE)
    training <- segmentationData[ inTraining,]
    notTraining <- segmentationData[-inTraining,]
    inValidation <- createDataPartition(notTraining$Class, p=.50, list=FALSE)
    validation <- notTraining[inValidation,]
    testing <- notTraining[-inValidation,]
    #
    # The model will be trained using method 'rpart', 
    # it has cp (Complexity Parameter) as only tuning parameter.
    #
    # The training will be tuned using different values for cp.
    # We'll choose the cp that maximizes accuracy.
    #
    cps = c(0, 0.001, 0.003, 0.01, 0.03)
    maxAccuracy = -1

    for(currentCp in cps) {

        #
        # Call train function using currentCp and train control set to 'none'.
        #
        f <- train(Class~., training, method = 'rpart', 
                       trControl = trainControl(method = "none"), 
                       tuneGrid = data.frame( cp = currentCp ))

        #
        # Predict on validation data set.
        #
        pr <- predict(f, validation)

        #
        # Select cp that maximizes accuracy.
        #
        cm=confusionMatrix(pr, validation$Class)
        currentAccuracy = cm$overall[[1]]
        if(currentAccuracy > maxAccuracy) {
            cpMaxAccuracy = currentCp
            maxAccuracy = currentAccuracy
        }
    }

    #
    # Output.
    #
    cpMaxAccuracy
    maxAccuracy

0 个答案:

没有答案