更新

Question

我在数据上训练SVM模型：

手写的数学符号（表示为46x46图像，具体地说，每个符号表示为numpy.array，后来改编为1x2116 numpy.array），
相应的字符串标签，例如＆＃39; +＆＃39;

所以我有一个训练样本的2116个功能。总共我有大约22000个训练样本（在许多不同的写作变体中，一个符号可以多次表示）。

我训练有素的SVM分类器对测试图像进行了错误分类（暂时由我自己绘制，保留图像46x46像素，例如我已经绘制了＆＃39; +＆＃39; \＃39; \ sum＆＃39 ，＆＃39; i＆＃39;）使用＆＃39; 1＆＃39;或者＆＃39; - ＆＃39;一直以来。

我怀疑分类器是不合适的，因此它只输出＆＃39; 1＆＃39;或＆＃39; - ＆＃39;。我正在测试伽玛和C参数的许多不同值。这是一段SVM培训代码：

# SVC.fit only accepts 2d train data array
# This transformation is necessary
nsamples, nx, ny = symbols.shape
symbols_dim2 = symbols.reshape((nsamples, nx * ny))

# For demo purposes I have choosen SVM Large Margin Classifier
# Initializing SVM classifier engine:
classifier = svm.SVC(gamma=10, C=100)

# Fit data - symbols, labels - appropriately
classifier.fit(symbols_dim2, labels)
print 'Training has successfully completed ...'

print 'Saving classifier properties into a file ...'
joblib.dump(classifier, trained_classifiers_dir_prefix + 'svmClassifier.pkl')

这是从测试脚本中截取的下一个代码：

# Load pre-saved classifier
classifier = joblib.load(data_dir_prefix + trained_classifier_dir_prefix + classifier_name)

print 'Classifier loaded - ready to use ...'

test_image = cv2.imread(test_data_dir_prefix + 'test3.jpg')
test_image = cv2.cvtColor(test_image, cv2.COLOR_BGR2GRAY)

rows, cols = test_image.shape
test_image = test_image.reshape(1, rows * cols)

print('PREDICTION: ' + str(classifier.predict(test_image)))

具体来说，我尝试过的参数值如下：gamma = 10，C = 100; gamma = 0.1，C = 1; gamma = 0.001，C = 1.

什么直接问题可能导致如此严重的错误分类？
哪些SVM参数适合这么多数据功能？
我应该尝试另一个分类器，比如神经网络或最近邻居吗？

注意：sklearn.SVC.svm需要将训练样本从46x46重新整形为1x2116以保持符号numpy.array维度＆lt; = 2

注意：我使用Python 2.7.10，scikit-learn 0.17.1

注意：训练数据（数学手写符号）最初是从CROHME数据集中获得的，经过转换（每个符号被提取到46x46图像）。 CROHME数据集：http://www.isical.ac.in/~crohme/CROHME_data.html

如果您想自行测试，请提供完整的代码：https://github.com/XaiNano/CROHME_inkmls_extractor

我愿意接受任何建议。我可以去任何其他分类器。谢谢，任何帮助！

更新

将图像从46x46更改为26x26有助于，现在它将4/10 26x26图像正确分类，而使用46x46则将其分类为1/10测试图像。此外，我将分类器从sklearn.neighbors更改为nearest_centroid。

SVM模型错误分类具有大量特征的训练样本

更新

0 个答案: