是否可以仅从我的数据集中预测一行?

时间:2018-12-25 15:19:56

标签: django python-3.x

我有一个数据集,如下表所示。我想单击链接按钮以根据“标签”字段进行预测。因此,我的问题是,因为我只想预测数据集的一行,如何根据sci-kit-learn中的这段代码将数据分为训练和测试集?

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state, test_size=test_size)

以下是我的观点,目的是让您了解我想做什么。

def prediction_view(request):
template='index.html'
.
.
.
train=Pull_Requests.objects.all()


    features_col = ['Comments', 'LC_added', 'LC_deleted', 'Commits', 'Changed_files', 'Evaluation_time','First_status','Reputation'] # This also test
        class_label=['Label']
    X = train[features_col].dropna() # This also test
    # y = train.Label # This also test
    y=train[class_label]

    random_state = 0
    test_size=request.POST.get('test_size')

    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state, test_size=test_size)
    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)

    classification_report={'accuracy':Accuracy, 'pricision':Precision, 'recall':Recall, 'f1_score':F1_meseaure}
    importance_features={'importances_feautre':importances_feautres}
    data={
        'new_data':new_data,
        'classification_report':classification_report,
        'importance_feature':importance_features,
        'features':features_col,
             }
return render(request,template,data)

Dataset sampleDataset sample

1 个答案:

答案 0 :(得分:1)

对于交叉验证,您可以使用sklearn中的LeaveOneOut。例如:

from sklearn.model_selection import LeaveOneOut 

loo = LeaveOneOut()
loo.get_n_splits(X)

for train_index, test_index in loo.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

请注意,给定 n 个样本,这将为您提供 n 个折痕。如果 n 很大,那么这可能会在计算上变得昂贵(尽管由于功能相对较少,所以 n 可能会变得非常大)。

另一种方法是生成一个随机整数(在火车索引范围内)作为每个测试要使用的索引:

import random

max_ind = train.index[-1]
rand_int = random.randint(0, max_ind)

test_idx = pd.Index([rand_int])
train_idx = train[~test_idx]

X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]

这假设train的索引单调增加。您可以使用train.index.is_monotonic_increasing (docs)检查这种情况,并根据需要使用train.reset_index(drop=True) (docs)。或者,您可以改用train.shape[0],在这种情况下,应确认索引中的每个值都是唯一的并且小于或等于train.shape[0]