在随机森林算法中按重要性对要素进行排名,所有要素的得分为0

时间:2018-10-24 15:04:06

标签: python random-forest prediction

我正在尝试使用随机森林算法对基于50个特征的名为“ Gold2”的列进行预测,我不知道哪个特征比其他特征对预测更重要,这就是为什么我尝试根据功能的重要性对我的功能进行排名,对于每个正在使用的功能,我都得到全0,不确定我的代码出了什么问题,我的预测能力也只有16%,这就是为什么我尝试检测特征的原因有助于预测并在将来的预测中重点关注的最重要功能,您能否指定其他代码以按重要性对我的功能进行排名?

`enter code here`  # -*- coding: utf-8 -*-
   """
   Created on Wed Oct 24 16:29:15 2018

   @author: mouna
   """

   # -*- coding: utf-8 -*-
   """
   Created on Wed Oct 24 14:18:34 2018

   @author: uashraf
   """

   import numpy as np 

   # data processing
   import pandas as pd 

   # data visualization
   import seaborn as sns
   from matplotlib import pyplot as plt
   from matplotlib import style

   # Algorithms
   from sklearn import linear_model
   from sklearn.linear_model import LogisticRegression
   from sklearn.ensemble import RandomForestClassifier
   from sklearn.linear_model import Perceptron
   from sklearn.linear_model import SGDClassifier
   from sklearn.tree import DecisionTreeClassifier
   from sklearn.neighbors import KNeighborsClassifier
   from sklearn.svm import SVC, LinearSVC
   from sklearn.naive_bayes import GaussianNB


   data = 
   pd.read_csv('C:/Users/mouna/ownCloud/Share/dumps/tablelogChessPython.csv')

   labels = data['Gold2']

   #plt.hist(labels)
   features = data.iloc[:,1:47]


   from sklearn.model_selection import train_test_split

   train_features, test_features, train_labels, test_labels = 
   train_test_split(features, labels, test_size = 0.25, random_state = 42)
   #test_features  = test_features.drop("Gold2", axis=1)

   random_forest = RandomForestClassifier(n_estimators=10)
   random_forest.fit([train_features.any()], [train_labels.any()])

   Y_prediction = random_forest.predict(test_features)
   acc_random_forest_test = round(random_forest.score(test_features, 
   test_labels) * 100, 2)
   print(round(acc_random_forest_test,2,), "%")

   importances = 

 pd.DataFrame({'feature':train_features.columns,'importance':np.round(random_fore 
   st.feature_importances_,3)})
   importances = 
   importances.sort_values('importance',ascending=False).set_index('feature')
   importances.head(47)

   # Decision Tree
   decision_tree = DecisionTreeClassifier()
   decision_tree.fit([train_features.any()], [train_labels.any()])

   Y_pred = decision_tree.predict(test_features)

   acc_decision_tree = round(decision_tree.score(test_features, test_labels) * 
   100, 2)
   print(round(acc_decision_tree,2,), "%")

0 个答案:

没有答案