在Python中使用XGBoost进行参数调整

时间:2019-04-08 05:39:21

标签: xgboost

在我的数据框中,我有3300万行,这是与分类有关的问题。数据框看起来像-

id prodA prodB prodC单身已婚age_20_30 age_40_50 is_purchase

1 .9461 .0539 0 0 1 0 1 0
2 .55 .44 .01 1 0 1 0 1 3 .65 .25 .10 0 0 1 0 1 4 .79 .21 0 0 1 1 1 0

prodA,prodB是乘积亲和力。

我到目前为止所做的-

df = read_csv('final_data.csv')

#Global Variables
global label, id_column, features
label = 'is_purchase'
id_column = 'id'
features = ['prodA', 'prodB', 'prodC', 'Single', 'Married', 'age_20_30','age_40_50']

train, valid, test = np.split(df.sample(frac=1), [int(.8*len(df)), int(.95*len(df))])

X_train, y_train = train[features], train[label]
X_valid, y_valid = valid[features], valid[label]
X_test, y_test = test[features], test[label]

dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_valid, label=y_valid)
dtest = xgb.DMatrix(X_test, label=y_test)

watchlist = [(dtrain, 'train'), (dvalid, 'valid')]
params = {
 'num_class' : 2,
 'learning_rate' : 0.05,
 'n_estimators':120,
 'max_depth':12,
 'min_child_weight':1,
 'gamma':2,
 'subsample':0.8,
 'colsample_bytree':0.5,
 'objective':'multi:softprob',
 'nthread':4,
 'seed':27}
num_round = 100

model = xgb.train(params, dtrain, num_round, watchlist, verbose_eval=1)

valid_pred = model.predict(dvalid)
best_valid_preds = np.asarray([np.argmax(line) for line in valid_pred])

print(precision_score(y_valid, best_valid_preds, average='macro'))
print(recall_score(y_valid, best_valid_preds, average='macro'))
print(f1_score(y_valid, best_valid_preds, average='macro'))
print(accuracy_score(y_valid, best_valid_preds))

test_pred = model.predict(dtest)
best_test_preds = np.asarray([np.argmax(line) for line in test_pred])

print(precision_score(y_test, best_test_preds, average='macro'))
print(recall_score(y_test, best_test_preds, average='macro'))
print(f1_score(y_test, best_test_preds, average='macro'))
print(accuracy_score(y_test, best_test_preds))

我的测试仪仅给我54%的精度,而我希望至少提高70%以上。数据集没有任何NA值。如何使用XGBoost(借助参数调整)提高模型精度?

0 个答案:

没有答案