ValueError:不能具有大于样本数的分割数n_splits = 3:1

时间:2016-10-03 04:25:29

标签: python scikit-learn cross-validation sklearn-pandas

我正在尝试使用train_test_split和决策树回归器进行此培训建模:

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score

# TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
new_data = samples.drop('Fresh', 1)

# TODO: Split the data into training and testing sets using the given feature as the target
X_train, X_test, y_train, y_test = train_test_split(new_data, samples['Fresh'], test_size=0.25, random_state=0)

# TODO: Create a decision tree regressor and fit it to the training set
regressor = DecisionTreeRegressor(random_state=0)
regressor = regressor.fit(X_train, y_train)

# TODO: Report the score of the prediction using the testing set
score = cross_val_score(regressor, X_test, y_test, cv=3)

print score

运行时,我收到错误:

ValueError: Cannot have number of splits n_splits=3 greater than the number of samples: 1.

如果我将cv的值更改为1,我会得到:

ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1.

数据的一些示例行如下所示:

    Fresh   Milk    Grocery Frozen  Detergents_Paper    Delicatessen
0   14755   899 1382    1765    56  749
1   1838    6380    2824    1218    1216    295
2   22096   3575    7041    11422   343 2564

1 个答案:

答案 0 :(得分:3)

如果分割数大于样本数,您将收到第一个错误。请查看以下source code中的代码段:

1

如果折叠次数小于或等于cv = 1,您将收到第二个错误。在您的情况下,if n_folds <= 1: raise ValueError( "k-fold cross validation requires at least one" " train / test split by setting n_folds=2 or more," " got n_folds={0}.".format(n_folds)) 。查看source code

X_test

有根据的猜测,3中的样本数量少于{{1}}。仔细检查。