较高的val_acc和较低的val_loss,对于Keras来说仍然是错误的预测

时间:2019-07-06 08:07:25

标签: python tensorflow machine-learning keras deep-learning

我有一组300.000+张图像,包含38个类别。当我训练图像时,我的val_loss和val_acc都低,但是当我尝试预测其中一张图像时(甚至从训练集中),它都不会给出正确的答案,甚至无法给出答案。

val_loss约为0.1023,而val_acc约为0.9738。

我尝试设置不同的图像,这些图像是由jpgraph生成的,具有不同种类的上下数据,用于来自我的水族馆计算机的特定测量日志数据,这与水的稳定性有关。我有5百万条mysql数据规则,这些规则是在5分钟的时间轴图像中生成的,每个规则有4个值。我已经上传了其中4张图片,您可以在https://ponne.nu/images/

上查看它们

我想做的是预测班级的下一步,因此,提前5分钟,经过培训,我会展示图片,然后根据数据上一堂课(例如200上,500下来)。

所以当我训练时,以任何形式的组合,据我所知,它将在acc,los,val_acc和val_loss上给出良好的结果。

实际训练可以节省每种模型(因此我可以在每次训练后进行测试)

Epoch 00023: saving model to /opt/graphs/saved/saved-model-23-0.97.hdf5
Epoch 24/50
37101/37101 [==============================] - 2013s 54ms/step - loss: 0.0968 - acc: 0.9738 - val_loss: 0.1048 - val_acc: 0.9731

Epoch 00024: saving model to /opt/graphs/saved/saved-model-24-0.97.hdf5
Epoch 25/50
37101/37101 [==============================] - 2014s 54ms/step - loss: 0.0968 - acc: 0.9738 - val_loss: 0.1012 - val_acc: 0.9734

Epoch 00025: saving model to /opt/graphs/saved/saved-model-25-0.97.hdf5
Epoch 26/50
37101/37101 [==============================] - 2016s 54ms/step - loss: 0.0968 - acc: 0.9738 - val_loss: 0.1092 - val_acc: 0.9725

训练脚本的一部分:

FAST_RUN = False
IMAGE_WIDTH=220
IMAGE_HEIGHT=220
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3 # RGB color
filenames = os.listdir("/opt/images/")
categories = []
for filename in filenames:
    category = filename.split('.')[0]
    if category == '0same':
        categories.append(0)
    elif category == '100up':
        categories.append(1)
    elif category == '200up':
        categories.append(2)
    elif category == '300up':
        categories.append(3)
 (snip)

df = pd.DataFrame({
    'filename': filenames,
    'category': categories
})
df['category'] = df['category'].astype('str');


以及所有其他类别(最多38个类别)将导致38的密集层。

model = Sequential()

model.add(Conv2D(64, (3, 3), activation='relu', input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(38, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

图像生成器

train_df, validate_df = train_test_split(df, test_size=0.27, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)

total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=10

train_datagen = ImageDataGenerator(
    rescale=1./255
 )

train_generator = train_datagen.flow_from_dataframe(
    train_df,
    "/opt/images/",
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical',
    batch_size=batch_size
)
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_dataframe(
    validate_df,
    "/opt/images/",
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical',
    batch_size=batch_size
)

epochs=50
history = model.fit_generator(
    train_generator,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=total_validate//batch_size,
    steps_per_epoch=total_train//batch_size,
    callbacks=callbacks
)

我在代码中的某处将验证数据/训练数据分成了单独的块,因此它将在未知图像上进行验证。

然后开始预测的那部分无效

batch_size=1
IMAGE_WIDTH=220
IMAGE_HEIGHT=220
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3

批处理大小为1,然后再次插入模型而没有丢失。之后,我加载重量。

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.load_weights('saved/saved-model-33-0.97.hdf5')

实际预测,给出图像并预测给定的类

def load_image(img_path, show=False):

    img = image.load_img(img_path, target_size=(220, 220))
    img_tensor = image.img_to_array(img)                    # (height, width, channels)
    img_tensor = np.expand_dims(img_tensor, axis=0)         # (1, height, width, channels), add a dimension because the model expects this shape: (batch_size, height, width, channels)
    img_tensor /= 255.                                      # imshow expects values in the range [0, 1]
    if show:
        plt.imshow(img_tensor[0])
        plt.axis('off')
        plt.show()
    return img_tensor

img_path = '/opt/testimages/test900up.jpg'
new_image = load_image(img_path)
pred = model.predict_classes(new_image, batch_size=1, verbose=1)
print pred

就是这样,在保存了50个模型之后,它有时会接近每个给定的已保存模型,但是有时只有随机图像,可以称其为自然运气,而不是实时预测。

由于数据更像是一种流动类型,因此我首先使用LSTM和硬数据进行了尝试,其结果与图像相同,具有很高的准确性和较低的损失,但是预测是绝对错误的。验证数据如何根据统计数据很好,而在同一张图像上进行预测却是如此糟糕?我在这里做错了什么?请注意,我是新手程序员。

0 个答案:

没有答案