Question

我正在尝试一个ML示例并且它在大多数情况下都有效但是当我连续运行代码时，python开始吐出不同的预测结果，现在我现在是ML专家，但这似乎很难？

# Example file from Google Developers: "Hello World - Machine Learning Recipes": YouTube: https://youtu.be/cKxRvEZd3Mw
# Category: Supervised Learning                                                                               
# January 14, 2018                                                                                            
from sklearn import tree                                                                                      

# Declarations: Texture                                                                                        
bumpy = 0                                                                                                      
smooth = 1                                                                                                     

# Declarations: Labels                                                                                         
apple = 0                                                                                                      
orange = 1                                                                                                                                                                 

# Step(1): Collect training data                                                                               
# Features: [Weight, Texture]                                                                                  
features = [[140, smooth], [130, smooth], [150, bumpy], [170, bumpy]]                                          

# labels will be used as the index for the features                                                            
labels = [apple, apple, orange, orange]                                                                        

# Step(2): Train Classifier: Decision Tree                                                                     
# Use the decision tree object and then fit 'find' paterns in features and labels                              
clf = tree.DecisionTreeClassifier()                                                                            
clf = clf.fit(features, labels)                                                                                

# Step(3): Make Predictions                                                                                    
# the prdict method will return the best fit from the decesion tree                                            
result = clf.predict([[150, bumpy], [130, smooth], [125.5, bumpy], [110, smooth]])                             
# result = clf.predict([[150, bumpy]])                                                                         
print("Step(3): Make Predictions: ")                                                                           
for x in result:                                                                                               
    if x == 0:
    print("Apple")                                                                                        
        continue                                                                                              
    elif x == 1:                                                                                              
        print("Orange")                                                                                       
        continue                                                                                              
    print("Orange")

Click link to see vim and bash windows

Answer 1

对于（大多数？）决策树算法来说，有一个随机元素，你的训练集非常小，可能会夸大效果。随机性通常用于确定使用多少/哪些样本，在您的情况下，样本非常少。

创建DecisionTreeClassifier时，请尝试将random_state设置为某个固定的整数。如果您想要一个可重复的测试结果，您需要使用相同的种子＆＃34;每次都有价值。他们在示例文档中使用零随机种子：

clf = DecisionTreeClassifier(random_state=0)

为什么决策树为完全相同的训练数据返回不同的解决方案

1 个答案: