在进行过采样和欠采样后，看到0与1之间的差值

Question

今天，我带来了一些奇怪的东西，我正在尝试使用Smote，欠采样和正常进行预测。我不知道为什么（在冒充和低估的情况下）这种偏见和重新计算给了我95％以上的权利。我澄清说，这种不平衡非常大（{0：1967，1：197}）。我对我的结果感到非常满意，因此决定尝试在X和Y中使用该算法（知道我必须提供类似于{0：1967，1：197}的东西，但不能）。当我在算法中进行比较时，它的效果很好（精度为95％，可校正的为96％），但是如果我对.Y中的model.predict（X）进行了建模，则它与真正的“ Y”完全不同。让我们看看是否有人可以帮助我。

我留下代码：

在进行过采样和欠采样后，看到0与1之间的差值

print('normal data distribution:{}'.format(Counter(y)))
X_smote, y_smote = SMOTE().fit_sample(X,y)
print('SMOTE data distribution:{}'.format(Counter(y_smote)))
X_nearmiss,y_nearmiss=NearMiss().fit_sample(X,y)
print('Nearmiss data distribution:{}'.format(Counter(y_nearmiss)))

正常数据分布：Counter（{0：1967，1：197}）SMOTE数据分布：Counter（{0：1967，1：1967}）Nearmiss数据分布：Counter（{0：197，1：197} ）

X_normal_train, X_normal_test, y_normal_train, y_normal_test = train_test_split(X,y,test_size = 0.3, random_state = 0)
X_smote_train, X_smote_test, y_smote_train, y_smote_test = train_test_split(X_smote,y_smote,test_size = 0.3, random_state = 0)
X_under_train, X_under_test, y_under_train, y_under_test = train_test_split(X_nearmiss,y_nearmiss,test_size = 0.3, random_state = 0)
stds = StandardScaler()

#######normal#######
X_train_nor = stds.fit_transform(X_normal_train)
X_test_nor = stds.transform(X_normal_test)
######smote########
X_train_smote = stds.fit_transform(X_smote_train)
X_test_smote = stds.transform(X_smote_test)
######under########
X_train_under = stds.fit_transform(X_under_train)
X_test_under = stds.transform(X_under_test)

＃套用猫

from catboost import CatBoostClassifier
model_cat_nor = CatBoostClassifier()
model_cat_smote = CatBoostClassifier()
model_cat_under = CatBoostClassifier()

model_cat_nor.fit(X_train_nor, y_normal_train)
model_cat_smote.fit(X_train_smote, y_smote_train)
model_cat_under.fit(X_train_under, y_under_train)

##########teste model########
    test_normal_primera=model_cat_nor.predict(X_test_nor)
    test_smote_primera=model_cat_smote.predict(X_test_smote)
    test_under_primera=model_cat_under.predict(X_test_under)

Counter({0.0: 650})
Counter({0.0: 636, 1.0: 545})
Counter({0.0: 63, 1.0: 56})

在这里我做“ classification_report”，其中预测和召回的子弹及以下对我的贡献超过95％。

然后出于好奇，我尝试对X进行预测（原始值必须与原始Y相似），但是没有。

standardatanuevo = stds.transform(X)
predicciondatanuevo1=model_cat_smote.predict(standardatanuevo)
print(Counter(predicciondatanuevo1))

，结果是 Counter（{1.0：2091，0.0：73}）的数量很多，而1的数量很少，但是这会给我一些类似于原始商品（{0：1967，1：197}）的信息。

也许我对最后的预测是错误的？如果是这样，我如何预测新的数据帧？（使用先前数据框的训练模型）

烟雾和欠采样的预测

在进行过采样和欠采样后，看到0与1之间的差值

0 个答案: