为什么我的keras自定义层很好地适合训练数据,但在验证时却给出不好的结果?

时间:2019-06-13 14:44:36

标签: python keras conv-neural-network

我试图了解Keras自定义图层的工作原理,但是我面临模型验证准确性的问题。

我试图在MNIST数据集上重现一个简单的卷积网络,但要使用结合了Conv2D运算符和BatchNormalisation的自定义层。

首先,我使用的数据:

from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = np.array([x.reshape(28, 28, 1) for x in X_train])
X_test = np.array([x.reshape(28, 28, 1) for x in X_test])
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)

这是效果很好的原始实现:

def get_model():
    input_ = Input(shape=(28, 28, 1))
    x = Conv2D(filters=64, kernel_size=3, activation="relu", input_shape=(28,28,1))(input_)
    x = BatchNormalization()(x)
    x = MaxPool2D(pool_size=(2,2))(x)
    x = Conv2D(filters=128, kernel_size=3, activation="relu")(input_)
    x = BatchNormalization()(x)
    x = MaxPool2D(pool_size=(2,2))(x)
    x = Conv2D(filters=256, kernel_size=3, activation="relu")(input_)
    x = BatchNormalization()(x)
    x = MaxPool2D(pool_size=(2,2))(x)
    x = Flatten()(x)
    x = Dense(128, activation="relu")(x)
    x = Dense(64, activation="relu")(x)
    x = Dense(10, activation="softmax")(x)
    mod = Model(inputs=input_, outputs=x)
    return mod

optim = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, clipvalue=K.epsilon())
model = get_model()
model.compile(optimizer=optim, loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=128, epochs=3, validation_data=(X_test, y_test))

在这个初始模型中,经过3个时间段,我的火车准确性达到了97%,验证率为97%

这是我的自定义图层:

class Conv2DLayer(Layer):
    def __init__(self, filters, kernel_size, dropout_ratio=None, strides=(1, 1), activation="relu", use_bn=True, *args, **kwargs):
        self._filters = filters
        self._kernel_size = kernel_size
        self._dropout_ratio = dropout_ratio
        self._strides = strides
        self.use_bn = use_bn
        self._activation = activation
        self._args = args
        self._kwargs = kwargs
        super(Conv2DLayer, self).__init__(*args, **kwargs)

    def build(self, input_shape):

        self.conv = Conv2D(self._filters,
                           kernel_size=self._kernel_size,
                           activation=self._activation,
                           strides=self._strides,
                           input_shape=input_shape,
                           *self._args,
                           **self._kwargs)
        self.conv.build(input_shape)
        self.out_conv_shape = self.conv.compute_output_shape(input_shape)
        self._trainable_weights = self.conv._trainable_weights
        self._non_trainable_weights = self.conv._non_trainable_weights

        if self.use_bn:
            self.bn = BatchNormalization()
            self.bn.build(self.out_conv_shape)
            self._trainable_weights.extend(self.bn._trainable_weights)
            self._non_trainable_weights.extend(self.bn._non_trainable_weights)

        if self._dropout_ratio is not None:
            self.dropout = Dropout(rate=self._dropout_ratio)
            self.dropout.build(self.out_conv_shape)
            self._trainable_weights.extend(self.dropout._trainable_weights)
            self._non_trainable_weights.extend(self.dropout._non_trainable_weights)

        super(Conv2DLayer, self).build(input_shape)

    def call(self, inputs):
        x = self.conv(inputs)
        if self.use_bn:
            x = self.bn(x)
        if self._dropout_ratio is not None:
            x = self.dropout(x)
        return x

    def compute_output_shape(self, input_shape):
        return self.out_conv_shape

最后,这是修改后的模型:

def get_model():
    input_ = Input(shape=(28, 28, 1))
    x = Conv2DLayer(filters=64, kernel_size=3, activation="relu")(input_)
    x = MaxPool2D(pool_size=(2,2))(x)
    x = Conv2DLayer(filters=128, kernel_size=3, activation="relu")(input_)
    x = MaxPool2D(pool_size=(2,2))(x)
    x = Conv2DLayer(filters=256, kernel_size=3, activation="relu")(input_)
    x = MaxPool2D(pool_size=(2,2))(x)
    x = Flatten()(x)
    x = Dense(128, activation="relu")(x)
    x = Dense(64, activation="relu")(x)
    x = Dense(10, activation="softmax")(x)
    mod = Model(inputs=input_, outputs=x)
    return mod

对于带有自定义图层的模型,我设法获得了相同的火车精度(97%),但是验证精度却停留在50%左右。

编辑

感谢Matias Valdenegro,我通过修改call方法来解决此问题:

def call(self, inputs):
    training = K.learning_phase()
    x = self.conv(inputs)
    if self.use_bn:
        x = self.bn(x, training=training)
    if self._dropout_ratio is not None:
        x = self.dropout(x, training=training)
    return x

使用K keras.backend模块。

1 个答案:

答案 0 :(得分:0)

Dropout和Batch Normalization在训练和测试/推断期间的行为不同,并且您的图层没有任何行为,因此其在推断期间使用这些内部图层作为训练模式,从而产生错误的结果。

我不确定,但是我认为您可以通过将training函数调用中的call参数传递给各层来解决此问题,例如:

def call(self, inputs, training=None):
    x = self.conv(inputs)
    if self.use_bn:
        x = self.bn(x, training=training)
    if self._dropout_ratio is not None:
        x = self.dropout(x, training=training)
    return x

这应该使内层在训练和测试/推断阶段中的工作方式有所不同。