我正在做分层分裂并且越界出错,我不知道为什么

时间:2017-09-19 16:57:26

标签: python split scikit-learn

我正在尝试分层洗牌,我是新手

from sklearn import preprocessing
from sklearn import cross_validation
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.model_selection import StratifiedShuffleSplit

data = featureFormat(my_dataset, features_list, sort_keys = True)
labels, features = targetFeatureSplit(data)
scaler = preprocessing.MinMaxScaler()
features = scaler.fit_transform(features)


split = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=42)
print len(features), len(labels)
for train_index,test_index in split.split(features, labels):
    print("TRAIN:", train_index, "TEST:", test_index)
    features_train,features_test = features_train[train_index],features_test[test_index] 
    labels_train,labels_test = labels_train[train_index],labels_test[test_index]

这是我得到的错误

enter image description here

即使限制达到143,显示100

1 个答案:

答案 0 :(得分:1)

您使用的是错误的变量名称 这些行:

features_train,features_test = features_train[train_index],features_test[test_index] 
labels_train,labels_test = labels_train[train_index],labels_test[test_index]

应该是:

features_train,features_test = features[train_index],features[test_index] 
labels_train,labels_test = labels[train_index],labels[test_index]

您基本上在变量被声明之前使用它们。您需要对原始要素和标签进行切片。