sklearn的预测是预测值超出范围

时间:2016-10-25 17:08:52

标签: python machine-learning scikit-learn

在该计划中,我每隔2.5秒扫描一次40 x 64 x 64图像的时间序列拍摄的大量样本。 '体素的数量'因此,每个图像中的(3D像素)是~168,000 ish(40 * 64 * 64),每个都是一个特征'对于图像样本。

我想到了使用主成分分析(PCA),因为降低了n的高度维度。然后使用递归特征消除(RFE)进行跟进。

有9个课程要预测。因此是一个多类别的分类问题。下面,我将这个9级分类转换为二进制分类问题,并将模型存储在列表 models 中。

models = []
model_count = 0

for i in range(0,DS.nClasses):
    for j in range(i+1,DS.nClasses):

        binary_subset = sample_classes[i] + sample_classes[j]

        print 'length of combined = %d' % len(binary_subset)
        X,y = zip(*binary_subset)
        print 'y = ',y

        estimator = SVR(kernel="linear")
        rfe = RFE(estimator , step=0.05)
        rfe = rfe.fit(X, y)

        #save the model
        models.append(rfe)
        model_count = model_count + 1
        print '%d model fitting complete!' % model_count

现在循环浏览这些模型并进行预测。

predictions = []
for X,y in test_samples:
    Votes = np.zeros(DS.nClasses)

    for mod in models:
        #X = mod.transform(X)
        label = mod.predict(X.reshape(1,-1)) #Something goes wrong here

        print 'label is type',type(label),' and value ',label
        Votes[int(label)] = Votes[int(label)] + 1

    prediction = np.argmax(Votes)
    predictions.append(prediction)
    print 'Votes Array = ',Votes
    print "We predicted %d , actual is %d" % (prediction,y)

标签应为0-8的数字,表示9种可能的结果。我正在打印标签值,这就是我得到的:

label is type <type 'numpy.ndarray'>  and value  [ 0.87011103]
label is type <type 'numpy.ndarray'>  and value  [ 2.09093105]
label is type <type 'numpy.ndarray'>  and value  [ 1.96046739]
label is type <type 'numpy.ndarray'>  and value  [ 2.73343935]
label is type <type 'numpy.ndarray'>  and value  [ 3.60415663]
label is type <type 'numpy.ndarray'>  and value  [ 6.10577602]
label is type <type 'numpy.ndarray'>  and value  [ 6.49922691]
label is type <type 'numpy.ndarray'>  and value  [ 8.35338294]
label is type <type 'numpy.ndarray'>  and value  [ 1.29765466]
label is type <type 'numpy.ndarray'>  and value  [ 1.60883217]
label is type <type 'numpy.ndarray'>  and value  [ 2.03839272]
label is type <type 'numpy.ndarray'>  and value  [ 2.03794106]
label is type <type 'numpy.ndarray'>  and value  [ 2.58830013]
label is type <type 'numpy.ndarray'>  and value  [ 3.28811133]
label is type <type 'numpy.ndarray'>  and value  [ 4.79660621]
label is type <type 'numpy.ndarray'>  and value  [ 2.57755697]
label is type <type 'numpy.ndarray'>  and value  [ 2.72263461]
label is type <type 'numpy.ndarray'>  and value  [ 2.58129428]
label is type <type 'numpy.ndarray'>  and value  [ 3.96296151]
label is type <type 'numpy.ndarray'>  and value  [ 4.80280219]
label is type <type 'numpy.ndarray'>  and value  [ 7.01768046]
label is type <type 'numpy.ndarray'>  and value  [ 3.3720926]
label is type <type 'numpy.ndarray'>  and value  [ 3.67517869]
label is type <type 'numpy.ndarray'>  and value  [ 4.52089242]
label is type <type 'numpy.ndarray'>  and value  [ 4.83746684]
label is type <type 'numpy.ndarray'>  and value  [ 6.76557315]
label is type <type 'numpy.ndarray'>  and value  [ 4.606097]
label is type <type 'numpy.ndarray'>  and value  [ 6.00243346]
label is type <type 'numpy.ndarray'>  and value  [ 6.59194317]
label is type <type 'numpy.ndarray'>  and value  [ 7.63559593]
label is type <type 'numpy.ndarray'>  and value  [ 5.8116106]
label is type <type 'numpy.ndarray'>  and value  [ 6.37096926]
label is type <type 'numpy.ndarray'>  and value  [ 7.57033285]
label is type <type 'numpy.ndarray'>  and value  [ 6.29465433]
label is type <type 'numpy.ndarray'>  and value  [ 7.91623641]
label is type <type 'numpy.ndarray'>  and value  [ 7.79524801]
Votes Array =  [ 1.  3.  8.  5.  5.  1.  7.  5.  1.]
We predicted 2 , actual is 8

我不明白标签值是浮点数的原因。它们应该是0-8的数字。

我正确加载了数据。执行predict()时出了点问题但是我仍然无法找到答案。

1 个答案:

答案 0 :(得分:3)

您正在获得浮点值,因为您使用SV R :支持向量回归。您需要SVC,支持向量分类

相关问题