Question

我正在使用LDA通过caret与问题预测因素一起遇到问题。出于某种原因，启用重新采样会引发一个信息不足的错误。有没有人见过这个？

这是一个可重复的玩具示例：

library(caret)
library(MASS)
DF <- data.frame(y = sample(as.factor(1:2), 200, replace = T), x1 = sample(as.factor(1:2), 200, replace = T), x2 = sample(as.factor(1:2), 200, replace = T))

# These two lines produce the same results
lda(DF[, -1], DF[, 1])
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'none'))$finalModel

# This gives an error
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'cv'))$finalModel

Error in train.default(DF[, -1], DF[, 1], method = "lda", trControl = trainControl(method = "cv")) : 
  Stopping

Answer 1

当使用因子变量作为独立变量而不使用公式接口时，似乎会发生这种情况。这有效：

# Convert independent variables to dummy variables
DF$x1 <- as.numeric(DF$x1 == "2")
DF$x2 <- as.numeric(DF$x2 == "2")
train(DF[, -1], DF[, 1], method = 'lda', 
      trControl = trainControl(method = 'cv'))$finalModel

或者，在将因子变量转换为二进制虚拟变量之后，x / y-Syntax也起作用：

SELECT REPLICATE('0',5-LEN(RTRIM(Id))) + RTRIM(Id) .......

请注意，根据方法，报告的组的平均值大约为0.5或大约1.5，因为问题中的前两个方法显然将因子级别强制为1或2（数值）。

使用重新采样时R插入LDA错误

1 个答案: