随机森林的模型和输入要素不匹配

时间:2017-07-06 07:48:22

标签: python python-2.7 ocr

我正在尝试对数据集执行机器学习并尝试显示样本和测试数据。我的目标是稍后加载我的IMAGE并使用模型进行测试。请帮我解决错误。

%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
import numpy as np
import random
from sklearn import ensemble

mnist = fetch_mldata('MNIST original', data_home='./')


#Define variables
n_samples = len(mnist.data)
x = mnist.data.reshape((n_samples, -1))# array of feature of 28*28 pixel
y = mnist.target                         # Class label from 0-9 as there are digits

#Create random indices 
sample_index=random.sample(range(len(x)),len(x)/5) #Selecting randomly list of len(x)/5 from the size of x
valid_index=[i for i in range(len(x)) if i not in sample_index]# Selecting the rest of the digits

#Sample and validation images
sample_images=[x[i] for i in sample_index]# 28*28 size of array which was used to classify digits in different classes
valid_images=[x[i] for i in valid_index]

#Sample and validation targets
sample_target=[y[i] for i in sample_index] # digits 0-9
valid_target=[y[i] for i in valid_index]

#Using the Random Tree Classifier
classifier = ensemble.RandomForestClassifier(n_estimators=30)

#Fit model with sample data
classifier.fit(sample_images, sample_target)

#Attempt to predict validation data
score=classifier.score(valid_images, valid_target)
print 'Random Tree Classifier:\n' 
print 'Score\t'+str(score)

from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt

def plot_decision_regions(X, y, classifier, 
                    test_idx=None, resolution=0.02):
    # plot the decision surface
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                         np.arange(x2_min, x2_max, resolution))

直到这里我没有任何错误,但是在这一行之后:

Z = classifier.predict(np.c_[xx1.ravel(), xx2.ravel()])
plot_decision_regions(x,y,classifier,test_idx=10)

我遇到以下错误:

ValueError: Number of features of the model must match the input. Model n_features is 784 and input n_features is 2 

1 个答案:

答案 0 :(得分:0)

当您调用函数classifier.predict时,您需要传递一组新的测试样本作为参数,分类器将从中预测新标签。但是,你传递的是一个网格矩阵,根本没有任何意义。显然,你的矩阵只有2列,预测器会抱怨,因为它预计会有784列作为你的列车。

我还建议您检查代码的其他区域,因为我发现了一些错误(例如len(x)/5必须被转换为int)。

希望这会有所帮助。祝你好运!