Question

我正在尝试通过Keras执行KFold交叉验证，但由于某些原因，KFold拆分无效。

from sklearn.model_selection import StratifiedKFold

X = train_data[features]
y = train_data['price']

kfold = StratifiedKFold(n_splits=10, shuffle=True)
for train, test in kfold.split(X,y):
    print(X[train])

我实际上是随后对模型进行拟合，但是那没有用，所以我尝试打印结果，从而产生以下警告和输出。

警告：：/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_split.py:672：用户警告：y中人口最少的类只有1个成员，小于n_splits = 10。％（min_groups，self.n_splits）），用户警告）

错误：：“ [Int64Index（[0，1，2，3，4，5，6，7，9，\ n 10，\ n ... \ n 39989， 39990、39991、39992、39993、39994、39995、39996、39997，\ n 39998]，\ n dtype ='int64'，长度= 36000）]位于[列]“中

Answer 1

错误是不言自明的：

警告： /opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_split.py:672： UserWarning：y中人口最少的类只有1个成员，小于n_splits = 10。％（min_groups，self.n_splits）），用户警告）

这意味着，对于代表性不足的类，您只有一个样本，因此分层拆分无法工作。

我建议您再次检查数据集以验证/更正标签。

StratifiedKFold拆分似乎不起作用

1 个答案: